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SURFACE EXPRESSION LIBRARIES 
OF RANDOMIZED PEPTIDES 



5 R&ryr,RQmfD TWVENTIdN 

This invention relates generally to methods for 
synthesizing and expressing oligonucleotides and, more, 
particularly, to methods for expressing oligonucleotides 
having random codon sequences. 

10 Oligonucleotide synthesis proceeds via linear coupling 

of individual monomers in a stepwise reaction. The 
reactions are generally performed on a solid phase support 
by first coupling the 3 • end of the first monomer to the 
support. The second monomer is added to the 5' end of the 

15 first monomer in a condensation reaction to yield a 
dinucleotide coupled to the solid support. At the end of 
each coupling reaction, the by-products and unreacted, free 
monomers are washed away so that the starting material for 
the next round of synthesis is the pure oligonucleotide 

20 attached to the support. In this reaction scheme, the 
stepwise addition of individual monomers to a single, 
growing end of a oligonucleotide ensures accurate synthesis 
of the desired sequence. Moreover, unwanted side reactions 
are eliminated, such as the . condensation of two 

25 oligonucleotides, resulting in high product yields. 

in some instances, it is desired that synthetic 
oligonucleotides have random nucleotide sequences. This 
result can be accomplished by adding equal proportions of 
all four nucleotides in the monomer coupling reactions, 
30 leading to the random incorporation of all nucleotides and 
yielding a population of oligonucleotides with random 
sequences. Since all possible combinations of nucleotide 
sequences are represented within the population, all 
possible codon triplets will also be represented. If the 
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. objective is ultimately to generate rando. peptide 
products, this approach has a severe limitation because the 
random codons synthesized will bias the ■ am.no a«ds 
■ incorporated during translation of the DNA by t..e ce.. ^..c 
5 polypeptides. 

. The bias is due to the redundancy of. the, genetic code. 
There are four nucleotide monomers, which leads to sixty- 
four possible triplet codons. With only twenty amino acids 
to specify, many of the amino acids are encoded by multiple 
codons. Therefore, a population of oligonucifeotides 
synthesized by sequential addition of monomers from a 
random population will not encode peptides whose amxno acid 
sequence represents all possible combinations- of the twenty 
different amino acids in equal proportions. That is, the 
frequency of amino acids incorporated into polypeptides 
will be biased toward those amino acids which are specified 
by . multiple codons. 

TO alleviate amino acid bias due to the redundancy of 
the genetic code, the oligonucleotides can be synthesized 
from nucleotide triplets. Here, a triplet coding for ^ each 
of the twenty amine, acids is synthesized from individual 
monomers. Once synthesized, the triplets are used in the 
• coupling- reactions instead of individual monomers. By 
mixing equal proportions of the triplets, synthesis of 
oligonucleotides vith random codons can be accomplished. 
However, the cost of synthesis from such triplets far 
. exceeds that of synthesis from individual monomers because 
triplets are not commercially available. 

Amino acid bias ■ can be reduced, however, by 
30 synthesizing the degenerate codon sequence NNK wh^re N xs 
a mixture of all four nucleotides and K is a mixture 
guanine and thymine nucleotides. Each position within an 
oligonucleotide having this codon sequence will contain a 
total of 32 codons (12 encoding amino acids being 
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represented once, 5 represented twice, 3 represented three 
times and one codon being a stop codon) . oligonucleotides 
expressed with such degenerate codon sequences will produce 
peptide products whose sequences are biased toward those 
5 amino acids being represented more than once. Thus, 
populations of peptides whose sequences are completely 
random cannot be obtained from oligonucleotides synthesized 
from degenerate sequences. 

There thus exists a need for a method to express 
10 oligonucleotides having a fully random or desirably biased 
sequence which alleviates genetic redundancy. The present 
invention satisfies these needs and provides additional 
advantages as well. 

gnMMARV OF THE J}J\rEHTtOV 



15 



The "invention provides a plurality of procaryotic 
cell's containing a diverse population of expressible 
oligonucleotides operationally linked to expression 
elements, the expressible oligonucleotides having a 
desirable bias of random codon sequences. 

20 pRTF.F DESCRTPTION O f DRAWINGS 

Figure 1 is a schematic drawing for synthesizing 
oligonucleotides from nucleotide monomers with random 
tuplets at each position using twenty reaction vessels. 

Figure 2 is a schematic drawing for synthesizing 
25 oligonucleotides from nucleotide monomers with random 
tuplets at each position using ten reaction vessels. 

Figure -3 is a schematic diagram of the two vectors 
used for sublibrary and library production from precursor 
oligonucleotide portions. M13IX22 (Figure 3A) is the 
30 V ctor used to clone the anti-sense precursor portions 
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: p/o eKpressicn --'^^'^"^ ^^^^^^ « to combined 

- ^^^^^ 

with ..riction Sites are . also shoTO. , 

5 selection and J.^rused to clone tne sense - 

" Mi3IX42 (figure SB) as ^""^^^^^ - ^ep««W tfie - 
p.ecursor vortio^^^^^^^ ^^^^ ,ene VXIX , 

pseudo-wild type ^presents the portion 

.. s.^^o.s. -et^o 
■ XO of M13IX42 Which • ^tridtlon. sites are also 

: a«.er stop codons and ^^^^ . ^^.,tor population ; 

showri. n^re 3C Shows ^ ^ ^^^^^^ 

„om sublibraries to fon. the '»=^V , ,^t,=. 

. .ector M13XX. ^i^-. . :;p: . or strain and the, 

,S egression library in ^ jyj/ „3,, « infect a 
production of phage The P ' ,^,ession and 

suppressor strain (Figure 3E) 
screening -of the library. 

j= ^..^-f^^n^ expression ixoratx^ 
20 generation of surface ^ . The syTabols are as 

oligonucleotide populations, (M13IX30) 

described for Figure 3. 

.igure Vu the nucleotide se<.ence Of K131X. (^^^ 



no: 1) 



25 

no: 2) 



.i^ee is the nucleotide se^ence Of K13XX. <S., 10. 



H^re.U the nucleotide se^ence Of H13XX30(S.- 



no: 3). 



, v^^p'seauenceof M13ED03 (SEQID 
Figure 8 is the nucleotide sequenc 



30 no: 4) . 



,i^e . is..He nucleotide se^ence o. .13XX..l CS.a 
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ID NO: 5) . 

Figure 10 is the nucleotide sequence of M13ED04 (SEQ 
ID NO: 6) . 

DETAILED DT^SCRIPTION TWVENflON 

5 This invention is directed to a simple and inexpensive 

method for synthesizing and expressing oligonucleotides 
having a desirable bias of random codons using individual 
monomers. The method is advantageous in that individual 
monomers are used instead of triplets and by synthesizing 

10 only a non-degenerate subset of all triplets, codon 
redundancy is alleviated. Thus, the oligonucleotides 
synthesized represent a large proportion of possible random 
triplet sequences which can be obtained. The 
oligonucleotides can be expressed, for example, on the 

15 surface of filamentous bacteriophage in a form which does 
not alter phage viability or impose biological selections 
against certain peptide sequences. The oligonucleotides 
produced are therefore useful for generating an unlimited 
number of pharmacological and research products. 

20 In one embodiment, the invention entails the 

sequential coupling of monomers to produce oligonucleotides 
with a desirable bias of random codons. The coupling 
reactions for the randomization of twenty codons which 
specify the amino acids of the genetic code are performed 

25 in ten different reaction vessels. Each reaction vessel 
contains a support on which the monomers for two different 
codons are coupled in three sequential reactions. One of 
the reactions couples an equal mixture of two monomers such 
that the final product has two different codon sequences. 

30 The codons are randomized by removing the supports from the 
reaction vessels and mixing, them to produc a single batch 
of supports containing all twenty codons at a particular 
position, synthesis at the next codon position proceeds by 
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equally dividing the mixed batch of supports xnto ten 
reaction vessels as before and sequentially coupling the . 
monomers for each pair of codons. The supports are again 

'mixed to randomise the codons at the position DUst 
5 synthesized. The cycle of coupling, mixing and dividing 
- ■ continues until the desired number of codon positions have 
been randomized.' After the last position has been 
• randomized, the oligonucleotides with random codons are 
cleaved from the support. The random oligonucleotides can 
10 then, be expressed, for example, on the^ surface of 
filamentous bacteriophage as gene Vlll-peptide fusion 
proteins. Alternative genes can be used as well. 

in its broadest form, the invention provides a diverse 
population of synthetic oligonucleotides contained , in , 

15 vectors so as to be expressible in cells. Such populations 
of diverse oligonucleotides can be fully random at one or 
^ore codon- sites or can be fully defined at one or more 
site, so long as at least one site the -codons are randomly 
variable. The populations of oligonucleotides can be 

20 expressed as fusion products in combination with surface 
proteins of filamentous bacteriophage, such as Mi3, as with 
gene VIII. The vectors can be trans fected into a plurality 
of cells, such as the procaryote E. cgj-i . 

The diverse population of oligonucleotides can be 
25 formed by randomly combining first and second precursor 
populations, each precursor population having a. desirable 
bias of random codon sequences. Methods of synthesizing 
and expressing the diverse population of expressible 
oligonucleotides are also provided. 

30 in a preferred embodiment, two populations of random 

. oligonucleotides are synthesized. The oligonucleotides 
within each population ericode a portion of the final 
oligonucleotide which is to be expressed. Oligonucleotides 
within one population encode the carboxy terminal portion 
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of the expressed oligonucleotides. These oligonucleotides 
are cloned in. frame with. a gene VIII (gVIII) sequence, so 
that translation of the sequence produces peptide fusxon, 
proteins. The second population of oligonucleotides are 
cloned into a separate vector, fach oligonucleotide wxthin 
this population encodes the anti-sense of the ammo 
terminal portion of the expressed oligonucleotides. This 
vector also contains the elements necessary for expression. 
The two vectors containing the random oligonucleotides are 
combined such that the two. precursor oligonucleotide 
portions are joined together at random to form a population 
of larger oligonucleotides derived from two smaller 
portions. The vectors contain selectable markers to ensure 
maximum efficiency in joining together the. two 
oligonucleotide populations. A mechanism also exists to 
control the expression of gVIII-peptide fusion proteins 
during library construction and screening. 

AS used herein, the term "monomer" or "nucleotide 
monomer" refers to individual nucleotides used in the 
chemical synthesis of oligonucleotides. Monomers that can 
be used include both the ribo- and deoxyribo- forms of each 
of the five standard nucleotides (derived from the bases 
adenine (A or dA, respectively) , guanine (G or dG)V 
cytosine (C or dC) , thymine (T) and uracil (U) ) - 
Derivatives and precursors of bases such as ino sine which 
are capable of supporting polypeptide biosynthesis are also 
included as monomers. Also included are chemically 
modified nucleotides, for example, one having a reversible 
blocking agent attached to any of the positions on the 
purine or pyrimidine bases, the ribose or deoxyribose sugar 
or the phosphate or hydroxyl moieties of the monomer Such 
blocking groups include, for example, dimethoxytrityl , 
benzoyl,- isobutyryl , beta-cyano thyl and- diisopropylamine 
groups, and are used to protect hydroxyls, exocyclic amines 
and ph sphate moieties. Other blocking agents can also be 
used and are known to one skilled in the art. 
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„ us.d herein, the term "tuplet.. refers to a^group Of ^ 
elements of a aef Inahle size. Th. elements of A tuplet^a. 
used herein are nucleotide monomers. For example, a tuplet 
■ ran ra <»inu=Uotide, a trinuoleot^ -^-^ 
5 or more nucleotides. 

r^As'used herein, the term ■■■codcmVor "triplet". «fJS 
a tuplet consist^. Of ^^J^^^^^'^ 

:r ^^jr^n^i^iptide ^os^^u 

.;^e term also includes nonsense, or .top, codons whxch do 



10 



not specify any amino acid. 



..Kandom codons" or "randomized codons " as ysed 

herein, refers to more than one codon at ^^^^^^^ 

i«^4"-;^oc The number of aitxereni^ 
' ronrr : r at any particular 

" Tc i on "-n-i«* Oligonucleotides," as - 
^f^rs y collect^n^ . OU,^^^^^^ 

::tsed^ri::i?:^^^---^ 

.0 within a randomized oligonucleotide contains r^o.^r^ 
per example, if randomized . 
nucleotides in length (i.e. , two codon.) and both ^Jirst 
^nd second codon positions are randomized to encode all 
and second P population of oligonucleotides 

twenty amino acids, then a p p ^ _ possible 

" :::i:aur:rth:i:nt;tT^s iri^rstani second 

rr°ma.es .p the ^-IJ^.l^^^T:^ 
oligonucleotides , ^ "t^, : 

30 rg—de: o^ "-en nucleotides - leng«. - 

Iv^Liesized Which have random codon sequences^ at all 

synthesiz twenty amino acids, then all 

positions encoding all twenty be 
triplets coding for each of ,the.twenty ammo 

found m equal proportions at very position.^ Ihe 
3S Sat^^n ^^nstitutin, the randomized ■ oligonucleotides 
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will contain 20» different possible species of 
oligonucleotides. "Random tuplets," or "randomized 
■ tuplets" are defined analogously. 

AS used herein, the term "bias" refers to a 
5 preference. It is understood that there can be degrees of 

preference or bias toward codon sequences which encode 
particular amino acids. For example, an oligonucleotide 
whose codon sequences do not preferably encode particular 
amino acids is unbiased and therefore completely , random . 

10 The. oligonucleotide codon sequences can also be biased 
toward predetermined codon sequences or codon frequencies 
and while still diverse and random, will exhibit codon 
sequences biased toward a defined, or preferred, sequence. 
"A desirable bias of random codon. sequences" as used 

15 herein, refers to the predetermined degree of bias which 
can be selected from totally random to essentially, but not 
totally, defined (or preferred) . There must be at least 
one codon position which is variable, however. 

AS used herein, the term "support" refers to a solid 
20 phase material for attaching monomers for chemical 
synthesis, such support is usually composed of materials 
such as beads of control pore glass but can be other 
materials known to one skilled in the art. The term is 
also meant to include one or more monomers coupled to the 
25 support for additional oligonucleotide synthesis reactions. 

AS used herein, the terms "coupling" or "condensing" 
refers to the chemical reactions for attaching one monomer 
to a second monomer or to a solid support. Such reactions 
are known to one skilled in the art and are typically 
30 performed on an automated . DNA synthesizer such as a 
MilliGen/Biosearch Cyclone Plus Synthesizer using 
procedures recommended by the manufacturer. "Sequentially 
coupling" as used her in, refers to th stepwise addition 
of monomers. 
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A method of synthesizing oligonucleotides having 
randoB tuplets using individual monomers is described ^ The 
method consists of several steps, the first being synthesis 
• of a nucleotide tuplet for ea.h tuplet to be randomxzed., 
AS described here and below, a nucleotide triplet (x.e., a 

-^c^don) will be: used as a specific example of a tuplet, Any 

size tuplet will work using the methods disclosed herein, 
and one skilled in the art would know how to use the . 
methods to randomize tuplets of any size. 

If the randomization of codons specifying all twenty 
amino acids is desired at a position, then twenty different 
codons are synthesized. Likewise, if randomization of only 
ten codons at a particular position is desired then those 
ten codons are synthesized. Randomization of codons f rom . 
two to sixty-four can be accomplished by synthesizing each 
desired triplet. Preferably, randomization of from two to 
twenty codbns is used for any one position because of the 
redundancy of the genetic code. The, codons selected at one 
position do not have to be the same codons selected at the 
, next position. Additionally, the sense or anti-sense 
sequence oligonucleotide can be synthesized. The process 
therefore provides for randomization of any desired codon 
position with any number of codons. 

codons to be randomized are synthesized. sequentially 
5 by coupling the first monomer of each codon to separate 
supports. The supports for the synthesis of each codon 
can for example, be contained in different reaction 
vessels such that one reaction vessel corresponds to the 
monomer coupling reactions for one codon. As will be used 
,0 here and below, if twenty codons are to be randomized, then 
twenty reaction vessels can be used in independent coupling 
reactions for the first twenty monomers of each codon. 
synthesis proceeds by sequentially coupling the second 
Bonomer of each codon to the first monomer to produce a 
35 dimer, followed by coupling the third monomer for each 
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codon to each of the above-synthesized diBers to produce a 
triBer (Figure 1, step 1, where H„ and H, represent the 
. first, second and. third monomer, respectively, for each 
codon to be randomized). 

5 Following synthesis of the first codons ; fron 

individual monomers, the randomization is achxeved by 
.ixing the supports from all twenty reaction vessels which 
contain the individual codons to be randomized. The solxd 
. phase support can be removed from its vessel and mixed to 
10 achieve a random distribution of all codon species wxthxn 
the population (Figure 1, step 2) . The mixed populatxon^of 
supports, constituting all codon species, are then 
redistributed into twenty independent reaction vessels 
(Figure 1, step 3). The resultant vessels are all 
15 identical and contain equal portions of all twenty codons 
coupled to a solid phase support. 

For randomization of the second position codon, 
synthesis of twenty additional codons is performed in each 
of the twenty reaction vessels produced in step 3 as the 
20 condensing substrates of step 1 (Figure 1, step 4). Steps 
1 and 4 are therefore equivalent except that step 4 uses 
the supports produced by the previous synthesis cycle 
(Steps 1 through 3) for codon synthesis whereas step 1 xs 
the initial synthesis of the first codon xn the 
25 oligonucleotide. The supports resulting from step 4 wxll 
each have two codons attached to them ^^'^-' ^ 
hexanucleotide) with the codon at the first posxtion bexng 
any one of twenty possible codons (i.e., random) and the 
coLn at the second position being one of the twenty 
30 possible codons. 

For randomisation of the codon at the second position 
and synthesis of the third position codon, st ps 2 thr ugh 
4 are again repeated. This process yields in each vessel 
a three codon oligonucleotide (i.e., 9 nucleotides) with 
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coaon positions .1 .na 2 rando.i..* a„4 P°""°" 
containing one of the twenty possible oodons. Steps 
t^Lu^h ^ are' repeated to "ndo.ize ; the tHird^pos,t^ . 

. codon and synthesize the codon at the neKt P^""""- ■ 
process is continued until an oXigonucleotxde pf„^the . 

:aesi«d-iength -is.achi.ved:. , After the nnal rando..,,t^ 
stip, the oUgonucleotide can he cleaved frc. the ..upports 
and isolated hy methods teown to one skilled - art 
Alternatively, the oligonucleotides can regain on the 
Sports, for use in methods employing prohe hybrid«at.on. 

The diversity of oodon sequences, i.e., the nuinber of 
different possible oligonucleotides, which can be obtained 
u ing the Lthods of the present invention, is e:^re»e y 
ZUl and only limited by the physical characteristics o 
available materials. For example, a support composed of 
.eads of about 100 .m in diameter will be "-"J " 
10,000 beads/reaction vessel using a 1 reaction vess^ 

. v,..,i« This Size bead can support 
containing 25 mg of beads. This size ^. „,i.g 
about 1 X lo' oligonucleotides per bead, synthesis using 
Separate reaction vessels for each of « ■ 

acids will produce ^^^^^^l^.^^^^ 
Sve^lty 'wlicrcat"trot;ranrlder"these"conditions is 
:rro"imItely ,10' copies of 10, 000 . 20 ^ ^^'^■^^^^ 
5 rfndom .Oligonucleotides. The diversity can be "^-^ 
Towever, in several ways without departing from the^ basic 
::::ods disclosed herein. Tor example, the nu^« f 
possible sequences can be increased by decrees l^ the sl« 
6f the individual beads which make up the support. * bead 
.0 of lout 30 in diameter will ihcreas. the number of 
headt per reaction vessel and therefore the number ^f 
ol^onucleotides synthesized. "ZTiX 
diversity of oligonucleotides with random codons is to 
t^crease'the volume of the reactibn vessel, ^"r ^^^ 
3S Ting the same size bead, a ^^rger volume c,n^ =^«.^ 
greater number cf beads than a smaller .vessel and theref ore 
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support the synthesis of a greater number of 
oligonucleotides. Increasing the n-amber of, codons coupled 
. to a support in a single reaction vessel also increases the 
diversity o£ the random oligonucleotides. . The total 
5 diversity will be the number of codons coupled per vessel 
raised to the number of codon positions synthesized. For 
example, using ten reaction vessels,, each synthesizihg two 
codons to. randomize a total of twenty codons, the number of 
different oligonucleotides of ten codons in length per iod 
10. . /im. bead can be increased where each bead will contain about 
2" or 1 X 10' different sequences instead, of one. One 
skilled in the art will know how to modify such parameters 
to increase the diversity of oligonucleotides . with random 

codons . 



15 
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A method of synthesizing oligonucleotides having 
random codons at each position using individual monomers 
wherein the number of reaction vessels is less than the 
number, of codons to be randomized is also described. For 
example, if twenty codons are to be randomized at each 
positicjn within an oligonucleotide population, then ten 
reaction vessels can be used. The use of a smaller number 
of reaction vessels than the number of codons to be 
randomized at each position is preferred because the 
smaller number of reaction vessels is easier to manipulate 
25 and results in a greater number of possible 
oligohuciebtides synthesized-. 

The use. of a smaller number of reaction vessels for 
random synthesis of twenty codons at a desired position 
within an oligonucleotide is similar to that described 
above using ' twenty reaction vessels except that each 
reaction vessel can contain the synthesis products of more 
than one codon For iBxinple/^ st p on synthiesis using t^^^ 
reaction vessels proceeds by^ coupling about two different 
codons on supports contained in each of ten reaction 
vess Is. This is shown in Figure 2 where each of the two 
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. = rivfferent support can consist of the 
ccdons coupled, to a ^^'^^^^^^ and Val; (2) - 

following J • (T/C)AT for Tyr and/His; (4) 

(T/C)CT for ser and Pro, 3 / ^^.^ 

-'C^GT for Cys and Arg? (5) (C/A)l(- . , xgv 

,x/C)GT ror y (a/G)CT for Thr and Ala, (8) 
(C/G)AG for Gin and Glu (7) ( / ) 

,VG)^ for.Asn ..d ^^'^^^i^^.y, signif ies^at ; - 

(10). A(T/A)A for lie and ^'^ J^ ^ of the 

a; .i.ture Of t.e^.on^^^^^ in t.e 

slash are used as xf they ^ ^^^^^^^ , . 

indicated coupling step. The a ^ _ synthesizing the 

..e ahove codons can J^lsense f or Phe ^ 

e.^Xe.enta^ . _,ed hy each of 

and val can be AA(C/A) . Th ^ standard 
the above pairs of sequences are given . 
three letter nomenclature. 

\,na of the monomers in this fashion will yield 
coupling^ of the . ^,^,,ally occurrxng 

codons specifying all twenty ^ reaction vessels, 

a^ino acids attached, to ^^^^^^^^ vess.l^ to be 
Hdwever, the number Of ^"'^-^^^ ^^^^^^^^ 
.sed will depend on the number ^^^^^. ^ one: 
at the desired position and can be ^ ^ ^ ^^^^ 

, . +.v,<^ a-rt For example, xt , 
.Killed in taction vessels, can be used f o, ■ 

„ndo.xzed «.en J-e given above can be Used for 

coupling. The coaon 5«4 of the codons can 

3 tbis s^tuesis as well. j;;;;^, of tbe , 

axso be Changed to incorporate or be ^ 
additional forty-four oodons whicb constitutes 

code. 

. . , «.e re^ainin, steps Of s^^^s =-U,on^^ 

:«:s:n^reis°::i:. - -'"-^^ 

rre perfor.ed vit. supports ^^'^^ Z - s.o«n in 
reaction vessels. . These remainin, step 

35 Figure 2 (steps 2 through 4) ... 
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Oligonucleotides having at least one specified tuplet 
at a predetermined position and the remaining positions 
having random tuplets can also be synthesized using the 
methods described herein. The synthesis steps are- sxmxlar 
5 to those outlined above using twenty or less reaction 
vessels except that prior to synthesis of the specified 
codon position, the dividing of the supports into separate 
reaction vessels for synthesis of different codons is 
omitted. For example, if the codon at the second position 
10 of the oligonucleotide is to be specified, then following 
s'Jiithesis of random codons at the first position and mixing 
of the supports, the mixed supports are not divided into 
new reaction vessels but, instead, can be contained in a 
single reaction vessel to synthesize the specified codon. 
15 The specified codon is synthesized sequentially from 
■ individual, monomers as described above. Thus, the number 
of reaction vessels can be increased or decreased at each 
step to allow for the synthesis of a specified codon or a 
desired number of random codons. 

20 Following codon synthesis, the mixed supports are 

divided into individual reaction vessels for synthesis of 
the next codon to be randomized (Figure 1, step 3) or can 
be used without separation for synthesis of a consecutive 
specified codon. The rounds of synthesis can be repeated 

25 for each codon to be added until the desired number of 
positions With predetermined or randomized codons are 
obtained. 

synthesis of oligonucleotides with the first position 
codon being specified can also be synthesized using the, 

30 above method. In this case, the first position codon is 
synthesized from the appropriate monomers. The supports 
ar divid d into the required- number of reaction , vess Is 
needed for synthesis of random codons at th second 
position and the rounds of synthesis, mixing and dividing 

35 are performed as described above. 
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„ method of. synthesizing oligonucleotides having 
tuplets which are diverse but biased toward a predetermined 
sequence is also described herein. This method employs two 
reaction vessels, : one vessel for the synthesis of a 
predetermined sequence and the second vessel for the 

^synthesis--of-- a - random - sequence.: ^ __This :. method ; is 

advantageous to, use when a. significant number of codon 
positions,. for example, are to be of a specified sequence 
since it alleviates the use of multiple reaction vessels, 
instead, a mixture of four " different monomers such as 
-adenine, guanine, cytosine and thymine nucleotides are used 
for the first, and second monomers in the codon. The codon 
is completed by coupling .a mixture of a pair of monomers of 
either guanine and thymine or cytosine and adenxne 
nucleotides at the third monomer position. In the second 
vessel, nucleotide monomers are coupled sequentially to 
yield the predetermined codon sequence. Mixing of the two 
supports yields a population of oligonucleotides containing 
both the predetermined codon and the random codoris at the 
desired position. Synthesis can proceed by using this 
mixture of supports in a single reaction vessel, for 
example,, for coupling additional predetermined codons or,, 
further dividing the mixture into two reaction vessels for 
synthesis of additional random codons. 

The two reaction vessel, method can be used for codon 
synthesis within an oligonucleotide with a predetermined , 
tuplet sequence by dividing the support mixture into two 
portions at the desired codon position to be randomized. 
Additionally, this method allows . for the extent of 
randomization to be adjusted. For example, unequal mixing 
or dividing of the two supports will change. the fraction of 
codons with predetermined sequences compared to those with 
random codons -at the desired position. Unequal mixing and 
dividing of supports can. be useful when there is a need to 
synthesize random codons at a significant number of 
positions within an oligonucleotide of a longer or shorter 
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The extent of randomization can also be adjusted by 
using uneau=l mixtures of looncmers in the first, secona ana 
Zirl nonl..r coupling steps of the random codon posxtxon. 
The une^al mixtures can be in any or all of the coupling 
steps to yield a population of codons enriched in sequences 
reflective of the mononer proportions. 

■ synthesis of randomized oligonucleotides is performed 
using methods well Icnown to one skilled in the art. I.xnear 
coupling of monomers can, for example, be accomplished 
using phosphoramidite chemistry with a MiUiGen/Biosearch 
Z^onl PIUS automated synthesizer as described by^the 
manufacturer (Mlllipore, Burlington, MA) . Other 
chemistries and automated synthesizers can be employed as 
well and are Icnown to one skilled in the art. 

synthesis of multiple codons can be performed without 
modification to the synthesizer by separately syntbesiztag 
the codons in individual sets of reactions. Alternatively, 
modification of an automated DMA synthesizer can be 
performed for the simultaneous synthesis of codons in 
multiple reaction vessels. 

in one embodiment, the invention provides a plurality 
of procaryotic cells containing a diverse population of 
expressible oligonucleotides operationally linked to 
egression elements, the expressible oligonucleotides 
having a desirable bias of random oodon sequences produced 
from diverse combinations of first and second 
oligonucleotides having a desirable bias 
sequences. The invention provides for a method for 
constructing such a plurality of procaryotic cells, as well. 

The oligonucleotides synthesized by the above methods 
can be used to express a plurality of random peptides which 
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are unbiased, diverse but biased toward a predetenaxned 
sequence or which contain at least one specified codon at 
a predetermined position. The need will determine which 
■ type of oligonucleotide is to be expressed to give tne 
5 resultant population of random peptides and is known to one 
skille-d • in ■ the art ; Expression ean be performed, in any. . 
compatible vector/host system. Such systems include, for 
example, plasmids or phagemids in procaryotes such as , I. • 
soli, yeast systems, and other eucaryotic systems such as 

10 mammalian cells, but. will be described herein in context 
with its presently preferred embodiment, i.e.' expression on , 
the surface of filamentous bacteriophage. Filamentous 
bacteriophage can be, for example, M13, fl and fd. Such 
phage have circular single-stranded genomes and double 

15 strand replicative DNA forms. . Additionally, the peptides, 
can also be expressed in soluble or secreted form depending 
., on the need and the vector/host system employed. ■ 

- Expression Of random peptides on the surface of M13 - 
. can be accomplished, for example, using the vector system 
20 shown in Figure 3. Construction of the vectors enabling 
one of ordinary skill to, make them are explicitly set out - 
in Examples I and II. ' The complete nucleotide; sequences, 
are given in Figures 5, 6 and 7 (SEQ ID NOS: 1,; 2 and 3, 
respectively). This system produces random 

25 oligonucleotides functionally linked to expression elements 
and to gVIII by combining two smaller oligonucleotide 
portions contained in separate vectors into a single 
vector. The diversity of oligonucleotide species obtained 
by this system or others described herein can be 5 x 10 or 
30 greater. Diversity of less than 5 x 10 can also be 
obtained and will be determined by the need and type of 
random.peptides to be expressed. The random combination of 
two precursor portions into a larger oligonucleotide 
increases the diversity of the population several foldvand 
35 has the added advantage of producing oligonucleotides 
larger than what can be synthesized by standard methods. 
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Additionally, although the^ correlation is not known, when 
the number of possible paths an cligonucleotide can taXe 
during synthesis such as described herein is greater than 
' the number of beads, then there will be a correlation 
5 between the synthesis path and the sequences obtained. By 
combining oligonucleotide populations which are synthesized 
separately, this correlation will be destroyed. Therefore, 
any bias which may be inherent in the synthesis procedures 
..will be alleviated by joining two precursor portions into 
10 a contiguous random oligonucleotide. 

Populations of precursor oligonucleotides to be " 
combined into an expressible form are each cloned into 
separate vectors. The two precursor portions which make up • 
the combined oligonucleotide corresponds to the carboxy and 

15 amino terminal portions of the expressed peptide. Each 
precursor oligonucleotide can encode either the sense or 
anti-sense and will depend on the orientation of the 
expression elements and the gene encoding the fusion 
portion of, the protein as well as the mechanism used to 

20 join the two precursor oligonucleotides. For the vectors 
shown in Figure 3, precursor oligonucleotides corresponding 
to the carboxy terminal portion of the peptide encode the 
sense strand. Those corresponding to the amino terminal 
portion encode the anti-sense strand. Oligonucleotide 

25 populations are inserted between the Eco RI and Sac I 
restriction enzyme .ites in M13IX22 and M13IX42 (Figure^3A 
and B). M13IX42 (SEQ ID NO: 1) is the vector used for 
sense strand precursor oligonucleotide portions and M13IX22 
(SEQ ID no: 2) is used for anti-sense precursor portions. 

30 The . populations of randomized oligonucleotides 

inserted into the vectors are synthesized with Eco RI and 

sac I recognition sequences flanking opposite ends of the 

random codon sequences. The sit s allow annealing >^ and 
ligation of thes single strand oligonucleotides into a 
35 double stranded vector restricted with Eco RI and Sac I. 
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Alternatively, the oligonucleotides can be inserted intc 
the vector by standard mutagenesis methods . In this latter 
method, single stranded vector DNA is isolated from ths 
' phage and annealed with random oligonucleotides having 
5 known sequences complementary to vector sequences. Jlha 
- - oligonucleotides ..are,; extended, with JDN^ polymerase to . 
produce double stranded vectors containing the randomized 
oligonucleotides. 

The vector used for sense strand bligonucleotice 
10 portions, M13IX42 (Figure 3B) contains down-stream and in 
frame with the Eco RI and Sac I restriction sites a 
sequence encoding the pseudo-wild type gVIII product. This 
gene encodes the wild type M13 gVIII amino acid sequence 
but has been changed at the nucleotide level to reduce 
15 homologous recombination with the wild type gVIII contained 
' on the same vector. The wild type gVTII is present to 
ensure that at least some functional, non-fusion codt 
protein will be produced. The inclusion of a wild type 
gVIII therefore reduces the possibility of non-viable phage 
20 production and biological selection against certain peptide 
fusion proteins. Differential regulation of the two genes 
can also be used to control the relative ratio of the 
pseudo and wild type proteins. 

Also contained downstream and in frame with the Eco BJ 
25 and sac I. restriction sites is an amber stop codon. " . .-^ 
mutation is located six codons downstream from Sac l ard 
therefore lies between the inserted oligonucleotides and 
the gVIII sequence. As was the function of the wild type 
gVIII, the amber stop codon also reduces biological 
30 selection when combining precursor portions to produce. 
. expressible oligonucleotides . This is accomplished by 
using a non-suppressor (sup 0) host strain because non- 
suppressor strains will terminat expression after-, the 
dig nucleotide sequences but before the pseudo gVni 
35 sequences. Therefore, the pseudo gVIII . will never be 
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expressed., on the ph,,e surface under th*se ^i-^-^^^- 
I^tead, only soluble peptides will be produced. 
. ^llpresslon in a n6n-suppressor strain can be advantageously 
utilised When one wishes to produce large populations of 
5 soluble peptides.. Stop codons other than a»ber, such as 
opal and ochre, or Molecular switches, such as inducible 
repressor, elements, can also be used to unlink peptide 
exi^ression fro. surface expression. Additional controls . 
exist as well and are described below. 

' The vector used for anti-sense strand oligonucleotide 
portions, M13IX22, (Figure 3A) , contains the expression 
elements for the peptide fusion proteins. Upstream and m 
frame with the Sac I and Eco RI sites in this vector is a 
leader sequence for surface expression. A ribosome binding 
15 site and Lac Z promoter/operator elements are present for 
transcription and translation of the peptide fusion 
proteins. 

Both vectors contain a pair of Pok I restriction 
enzyme sites (Figure 3 A and B) for joining together t^ 
,0 preLsor oligonucleotide portions and 

L.^ences. one site is located at the ^ds -f ea^ 
precursor oligonucleotide which is to be joined. ^e 
second Fok I site within the vectors ^'^^^J^ 
of the vector sequences which are to be joined ■ The 5 
25 overhang of this second Fok I site has been altered to 
encode a sequence which is not found in the overhangs 
produced at the first Fok I site within the oligonucleotide 
Uions. The two sites allow the cleavage of each 
circular vector into two portions and subsequent ligation 
circuxa ^Uvln each vector into a sxngle 

•?o of essential components wxthln eacn ve«-u 

cirLlar vector where the two oli,=nU=Xeotid.^pr.<^^^ 
portions form a contiguous sequence - ^ 
compatible ov rhangs produced at the two Pok I 
optimal conditions to be selected (or performing 
35 concatermization or circularization reactions for jolhing 
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■ ■ the two vector portions. Such selection of conditions' can . 
be used to govern the reaction order and therefore increase, 
the efficiency of joining. 

Fok I is a . restriction enzyine whose ' recognition 
5 sequence is distal"- to the: point of ^ cleavage. ^ P^^^al ^ 
placement of the recognition sequence in its locatxon to 
the cleavage point is important since if the two were 
superimposed within the' bligonucleotide portions to be 
combined, it would lead to^ an invariant codon sequence at 
10 the-iuncture. To alleviate the formation of invariant 
codons at the juncture, FoJc I recognition sequences can be 
placed outside of the random codon sequence and" still he 
used to restrict within the random sequence. Subsequent 
annealing of the single-strand overhangs produced by Fok I 
15 and ligation of the two oligonucleotide precursor portions 
allows the juncture to be formed. A variety of restriction 
enzymes restrict DNA by this mechanism and can be used 
instead of Fok I to join precursor oligonucleotides without 
creating invariant codon sequences. Such enzymes include, 
20 for example, Alw I, Bbu I, Bsp MI, Hga I, Hph I, Mbo II, 
„nl I, Pie I and Sfa NI. One skilled in the art knows how 
to substitute Fok I recognition sequences for alternative 
enzyme recognition sequences such as those above, and use 
the appropriate enzyme for j oining precursor 
25 oligonucleotide portions. 

Although the ■ sequences of the precursor 
oligonucleotides are random and will invariably have 
oligonucleotides within the two precursor populations whose 
sequences are sufficiently complementary to anneal after 

30 cleavage, the efficiency of ax^nealing can be ^-^^^ ^ 
insuring that the single-strand overhangs within one 
precursor population will have a complementary sequence 
within the second precursor , population. This can be 
accomplished by synthesizing a non-degerierate series of 

35 known sequences at the Fok I cleavage site coding for each 
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of the twenty amino acids. Since the Fok I cleavage site 
contains a four base overhang, forty different sequences 
. are needed to randomly encode all twenty amino acids. For 
example, if two precursor populations of ten codons in 
5 length are to be combined, then after the ninth codon 
position is synthesized, the mixed population of supports 
are divided into forty reaction vessels for each, of the 
populations and complementary sequences for each of the 
corresponding reaction vessels between populations are 

10 independently synthesized. The sequences are shown in 
^ Tables III and VI of Example I where the oligonucleotides 
on columns IR through 4 OR form complementary overhangs with 
the oligonucleotides on the corresponding columns IL 
through 40L once cleaved. The degenerate X positions in 

15 Table. VI are necessary to maintain the reading frame once 
the precursor oligonucleotide portions are joined. 
However, use of restriction enzymes which produce a blunt 
end, such -as Mnl I can be alternatively used in place of 
Fok I to alleviate the degeneracy introduced in maintaining 

20 the reading frame. 

The last feature exhibited by each of the vectors, is 
an amber stop codon located in an essential coding sequence 
within the vector portion lost during combining (Figure 
3C) . The amber stop codon is present to select for viable 
25 phage produced from only the proper combination of 
precursor oligonucleotides and their vector sequences into 
a single vector species. Other non-sense mutations or 
selectable markers can work as well. 

The combining step randomly brings together different 
30 precursor oligonucleotides within the two populations into 
a single vector (Figure 3C; Ml 3 IX) . The vector sequences 
donated from each independent vector, Mi3IX2 2 and m3IX42,. 
are necessary for production of viable phage. Also, since 
th expression elements are contained in M13IX22 and the 
35 gVIII sequences are contained in M13IX42, expression of 
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■ fusion proteins cannot 

functional gVIII-peptxde ^"^^"^^^ in 
accomplished until the sequences, are Ixn 

;M13TX. 

. ^ y-mc^A bv restricting each 
y-'^ir^r, «;fcGD is performed cy '-'^ 
The combining step F . -.^.na randomized 

- - . - sectors.- - _ . 

5 population of J . ^ ^^^i^, aid .lighting (Figure 

oligonucleotides with FoK I, ^ J^^^,^ 3^er stop. 
30, - «-="j:n^t%e When introduced into a.: 

codon will not produce viable p _g ^^ly U,e 

„on-suppressor strain J^^;- ^^^^ ^„ „.on will 

10 sequences which do ';°"';""^ 7^,tors contained in the 

.alee up the final population of vecto ^^^^^^ 

Ubrarv. ^r J--^ 

,or surface e>^«==^°" f^,^' ,3ctor portions can he 



a method of , selecting 
The invention P-^^^^^'" , ii,and binding protein 
. peptides capable of being bound by a lig .^j • operationally 
Jou a population of "^^^f 'tli^Lucleotides 

,„ Xinxing a diverse ^^l^JZo. se<^ences to a 

.aving ^ ^^^^Z^^ ^ ^Te 

first vector? (h) op ^^.^ps having a desirable 

population of second oligonucleotides havij ^^^^^^ 

Las of rando. codon Jp^^^l, and ,b, under 

,5 combining the vector produc«^ o£^P ^^^^ ^^^ 

oonditions -"^^^LrrCrer into a population of 
Oligonucleotides ^aid population of 

combined vectors: (d) . ^^^r conditions 

.o^binea vectors -^^J^^V'^^^.tion of rando. 

30 sufficient -y-^^Tl, ;e p^pUaes Which bind 
peptides, and *"^='™^^/,„,„tion also provides for 
said binding protein. The i^ sequence of such 

determining the encoding , nucleic acid 

peptides as well. 
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sur.a« expression -^^"''^^.T'V'^^^i^- 
performed in an ainber suppressor strain. As c 
Love, the amber stop codon between the random codon 
TZ^c. and the gVIXX se^ence unlinks the t«o oomponen^ 

non-suppressor strain. Isolating the phage produced 
from the non-suppressor strain and infecting a suppressor 
strain will linX the random codon secp.ences to the gVIII 
sequence during expression (Fi^re 3B) . Oalturing the 
suppressor strain after infection allows the expression of 
all peptide species within the library as «VIII-peptxde 
fusion proteins. ' Alternatively, the DNA can be isolated 
from the non-suppressor strain and then introduced into a 
suppressor strain to accomplish the same effect. 



The level of expression of gVIII-peptide fusion.; 

15 proteins can additionally be controlled at the 
transcriptional level. The gVIII-peptide fusion proteins 
are under the inducible control of the Lao 
promoter/operator system. other inducible promoters can . 
worlc as well and are known by one sKiUed m the art For ^ 

20 high levels Of surface expression, the suppressor literary 
is cultured in an inducer of the Lao Z promoter such as 
isopropylthio-6-galactoside (IPTG) . Inducible control is 
beneficial because biological selection "^""^ J""^ 
functional gVIII-peptide fusion proteins can be -^"^""^ 

25 by culturin, the library under — '^"-^''V t C of 
4ression can then be induced only at the t«.e of 
sheening to ensure that the entire P»P»^»*^;J^ 
Oligonucleotides within the library are accurately 
represented on the phage surface. Also this can be used to 
30 control the valency of the peptide on the phage surface. 

The surface expression library is screened for 
specific peptides which bind ligand binding P"*"";^^ 
standard affinity isolation procedures. such methods 
include, for exampl , panning, affinity chromatography and 
35 solid p^ase blotting procedures. Panning as described by 
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Parmley and . SBith, Gene ,3:305-318 (19S8) , 
incorporated herein by reference, is pref.rrW because h.gh : 
titers of phage can be screened easily, quicKly and m 
s^U . volu.es. Furthermore, this procedure can select 
5 ^Inor peptide species within the -population, which 
:^:Lise would have been urdetectable, and -P^""^ = 
■ ■ ' substantially homogenous populations. -The; selected peptlde^^^ 

se<^ences can be determined by seguencing the nucleic acid 
enLmg such peptides after amplification of the phage 

10 population. 

■ The invention provides a plurality" of procaryotic 
cells containing a diverse population of oligonucleotxdes 
having a desirable, bias of random codon sequences that are 
operationally linked, to expression sequences. The 
15 invention provides for methods of constructing such 
populations, of cells as yell. 

- . Kandom oligonucleotides synthesized by- any of the 
methods described previously can also be expressed on, the 
surface of filamentous bacteriophage,, such as K13, for 
20 example,, without the Joining together of ■ precursor 
Iligonucleotides,. A vector such as that shown in rlg^reV 
H13IX30, can be used. This vector exhibi^^ aU ^e 
functional features of the combined vector shown in Figure 
3C for surface expression of gVIII-peptlde fusion proteins 

L Lplete nucleotide se^ence tor H13XX30 (SEQ ID HO: 3, 
is shown in Figure 7. 

H13XX30 contains a Wild type.gVXXI^for phag^^^^^^^ 
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and a pseudo gVIII sequence for peptide fusions, 
vector also contains in frame restriction sites for cloning 
ra:lom peptides. The cloning sites in ^^^^^^ 
I, stu I and Spe I. Oligonucleotides should *eref°re>e 
synthesized with the appropriate complem ntary ends for 
a^eallng ■ and ligation or insertional ^^^^ 
Mtematlvely, the appropriate termini can be generated by 
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PCR technology. Between the restriction sites and the 
pseudo gVIII sequence is an in-frame amber stop codon, 
. again, ensuring complete viability of phage in constructing 
and manipulating the library. Expression and screening is 
5 performed as described above for the surface expression 
library of oligonucleotides generated from precursor 
portions- 

Thus, the invention provides a method of selecting 
peptides capable of being bound by a ligand binding protein 

10 from a population of random peptides by (a) operationally 
linking a diverse population of oligonucleotides having a 
desirable bias of random codon sequences to expression 
elements; (b) introducing said population of vectors into 
a compatible host under conditions sufficient for 

15 expressing said population of random peptides; and (c) 
determining , the peptides which bind to said binding 
protein. Also provided is a method for determining the 
encoding nucleic acid sequence of such selected peptides. 

The following examples are intended to illustrate, but 
20 not limit the invention. 

EXAMPLE I 

T^^T^^-^nn ^r^ci chara n^pri7.ation of P e ptide L iqands Generated 
vr-nm Right ^^nd Left H? if T^andom 01 i qonucleotides 

25 This example shows the synthesis of random 

oligonucleotides and the construction and expression of 
surface expression libraries of the encoded randomized 
peptides. The random . peptides of this example derive from 
the mixing and joining together of two random 

30 Oligonucleotides. Also demonstrated is: the isolation and 
characterization of peptide ligands and- their corresponding 
nucl otide sequence for specific binding proteins. 
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thesis of two randomized oligonucleotides vhieh 
The synthesis ol two i ^ , randomized 

• correspond to smaller portions ° smaller 
Oligonucleotide, is "^^^^ .ligonucleotlde.; ■ 

5 ■ portiiris make up- pne.half-^ _ ^ugonucleitides constituting- 
The population of ^^^'^'^i L r!,^^ and left half.. Bach 
each half are ^"^'"^"f '^^/^l^^es are ten codons in 
population of right and ^^^T^ position. The . . 

Xength with . twenty ^^^J"^; ,,^.„oe ; o. the 

10 right half =°«^=P°"'*=/ , . „oode the carboxy terminal 
randdmized oligonucleotides and ^'^^^^^ corresponds 
. half of the expressed, peptides. _ • randomized 

to the anti-sense ^'^IJ^,^^^^ half of the 
oligonucleotides and encode the amino 
15 expressed peptides. The right and le ^ ^ 

rrndomized oligpnucleotide ^^"^"^^ ^a^^oined so that, 
separate vector species and then ' ^ 

J Tr^^f halves come togei-u«^ 
the right , and left "^^^^ ^ ^^ion vector species 

combination to produce a single ^''^^^/^^^..^ohucleotides . 
,0 -ich contains a population Of —^^^^^^ ^^^^^^ 

twenty codons in length. „„auoes filamentous 

population into an -5^"^"^'%^°=- J their surface. " 
phage which express the random peptides 

The reaction vessels for oligonucleot^e synt^^^^ 

„ere -obtained from, the ^, of 

, synthesizer (HiUiPore, ^^"T'^^^. . The vessels 
illicen/Biosearch ^^-^^^J^f^trrU™ 
were supplied as pacXages ■^'^"^ ,„iUiGen/Biosearch 
(I- Mmolc). frits, crimps ^ ^ u„derivatizad 
30 catalog , nucleotides- - 

control pore glass, ph ? obtained from 
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synthesis - and decrlmper , tools were 

Hillioen/Biosearch. crimp , Pittsburgh, PA 

obtained from Fi^er ^^^^f ^Jl,,;, respectively)..: 
(Catalog numbers 06-406-20 and 06-406 
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Ten reaction columns, were used for right half 
synthesis of random oligonucleotides ten codons in length. 
The oligonucleotides have 5 monomers at their 3 ' end of the 
sequence 5- GAGCT3' and 8 monomers at their 5' end of the 
sequence S'AATrcCATS ■ . The synthesizer was fitted with a 
column derivatized with a thymine nucleotide (T-column, 
MilliGen/Biosearch # 0615.50) and was programmed to 
synthesize the sequences shown in Table I for each of ten 
columns in independent reaction sets. The sequence of the 
last three monomers (from right to left since synthesis 
proceeds 3' to 5'') encode the indicated amino acids: 

Table I 



15 



20 



25 



30 



Coluinn 



Sequence 
rs' to 3' ) ftmino Aca,ds 



column IR (T/G)TTGAGCT Phe and Val 

column 2R (T/C) CTGAGCT Ser and Pro 

column 3R (T/C)ATGAGCT Tyr and His 

column 4R (T/C)GTGAGCT Cys and Arg 

column 5R (C/A)TGGAGCT Leu and Met 

column 6R (C/G)AGGAGCT Gin and Glu 

column 7R (A/G) CTGAGCT Thr and Ala 

column 8R (A/G) AT6AGCT, , Asn and Asp 

column 9R (T/G)GGGAGCT Trp and Gly 

column IR A(T/A)AGAGCT lie and Gys 

where the two monomers in parentheses denote a single 
monomer position within the codon and indicate that an 
equal mixture of each monomer was added to the reaction for 
coupling. The monomer coupling reactions for each of the 
10 columns were performed- as recommended by the 
manufacturer (amidit version S1.06, * 8400-050990. scale 
1 MM) . After the last coupling reaction, the colimns were 
washed with acetonitrile and lyophilized to dryness. 

Following synthesis, the plugs were removed from each 
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•„a a decrimper and the reaction products were 

IlTsL, aue to t.e wei..t o. t.e-.—s, 

later rounds of synthesis material is xu=t. .ei^ ^ 

la-cer j.^^** _ . , .^^ ._j^^^,r=4-Vcr*a/^ control 



r:^te: a. e^aXlzs. wit. u.a..ivati.e. control 
■ore'-^as. and . .i..a , thoroughly ; to ohtain. a^ « 
: "ribiti=n o, all twenty coaon .pecies. ■ The reaction 



„e 4e„ aU^ottea into .0 new reaction^c l^s 

r«..vin, W - serial 
,n aeoarate reaction colunns. Alternatively, i 

"oduct. can be ali^otted hy suspending the beads in a - 

that is dense e«ou,h for the h*ads to renain 
d^ersed, preferably a li^id that is e^al in dens^y^o 
the beads, and then aliquotin, equal volumes of the 
„ suspension into separate reaction colu^s. The Ixp on the 
inside of the coluMn. where the frits rest was cleared of 
,terial using vacuum suction with a syringe a«d 2=. G 
TeeSe Tew frits were placed onto the lips, the plugs 
; Tre fitted into the coXu»,s and ■ were cri.ped into place . . 
20 using a crimper. 

synthesis . Of the second codon position was achieved 
using the above 10 co1um.s containing the 
reaction products fro- the first codon synthesis. The 
::„::lr cLpimg reactions —e second ccdon positi^ 
,5 are shown in Table II. ^.t in the first '^^^^^ 
Lat any monomer can be programmed into the synthesizer 
rla" position, the first monomer position is -Vcoupled 
the synthesiser since the . software -umes Jhat Jhe 
™er is already attached to the column. An h also 
, 30 "s^^at the cllumns from the previous codon^syn^... 
Should be placed on the synthesiser for use m the present 
svn«^esis round. Reactions were again sequentially 
rS«ted ■ for : each column as shown in Table XI ^a.d the 
reaction products washed and dried as described above. 
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Table II 



10 



Column 




Seguencc 
to 3') 


Amino Acids 


column 


IR 


(T/G)TTA 


Phe 


and Val 


column 


2R 


(T/C)CTA 


Ser 


and Pro 


column 


3R 


(T/C)ArA 


Tyr 


and His 


column 


4R 


(T/C)GTA 


Cys 


and Arg 


column 


5R 


(C/A)TG^ 


Leu 


and Met 


column 


6R 


(C/G) AGA 


Gin 


and Glu 


column 


7R 


(A/G)CTA 


Thr 


and Ala 


column 


8R 


(A/G)AT^ 


Asn 


and Asp 


column 


9R 


(T/G)GGA 


Trp 


and Gly 


column 


lOR 


A(T/A)AA 


lie 


and Cys 



Randomization of the second codon position was achieved by 
15 removing the reaction products from each of the columns and 
thoroughly mixing the material. The material was again 
divided into new reaction columns and prepared for monomer 
coupling reactions as described above. 

Random synthesis of the next seven codons (positions 
20 3 through 9) proceeded identically to the cycle described 
above for the second codon position and again used the 
monomer sequences of Table II. Each of the newly repacked 
columns containing the random mixture of reaction products 
from synthesis of the previous codon position was used for 
25 the synthesis of the subsequent codon position. After 
synthesis of the codon at position nine and mixing of the 
reaction products, the material was divided and repacked 
into 40 different columns and the monomer sequences shown 
in Table III were coupled to each of the 40 columns xn 
30 independent reactions. The oligonucleotides from each of 
the 40 columns were mixed once more and cleav d from the 
control pore glass as recommended by the manufacturer. 
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Table III 



10 



15 



20 



25 



30 



35 



Golumn ■ 
coliunn IR 
colioinn 2R- - 
colximn 3R 
column 4R 
column 5R 
column 6R: 
columji 7R 
column 8R 
column 9R 
cdlximn lOR 
column IIR 
column 12R 
colximn I3R 
column 14R 
column 15R 
column 16R 
column 17R 
column 18R 
column 19R 
column 2 OR 
column 21R 
column 22R 
colximn 23R 
column 24R 
column 25R 
column 26R 
column 27R 
column 28R, 
colvunn 29R 
column 3 OR 
column 31R 
column 3 2R 
column 33R 



jqp. quenc^ f^' 
AATTCTTTTA 
.AATTCTGTT^b ^ , „ 

AATTCGTTTA 
AATTCGGTTA 
AATTCTTCT^ 
AATTCTCCTA 
AATTGGTCTA 
AATTCGCCT^ 
AATTCTTATA 
AATTCTCAT^ 
AATTCGTATA 
AATTCGCATA 
AATTCTTGTA 
AATTCTCGTA 
AATTCGTGT^ 
AATTCGCGTA 
AATTCTCTG& 
AATTCTATG^ 
AATTCGCTG4 
AATTCGATGA 
AATTCTCAG& 
AATTCTGAGA 
AATTCGCAGA 
AATTCGGAG^ 
AATTCTACTA 
AATTCTGCra 
AATTCGACT^ 
AATTCGGCT^ 
AATTCTAAT^ 
AATTCTGAT4 
AATTCGAAT^. 
AATTCGGATA 
AATTCTTGGA 
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column 34R AATTCTGGG^ 

: colvunn 35R AATTCGTGGA 

column 36R AATTCGGGGA 

column 37R AATTCTATA^ 

5 column 38R . AATTCTAAA^ 

column 39R AATTCGATAA 

coliimn 4 OR AATTCGAAAA 

Left half synthesis of random oligonucleotides 
proceeded similarly to the right half synthesis. This half 

10 of" the oligonucleotide corresponds to the anti-sense 
sequence of the encoded randomized peptides. Thus, the 
complementary sequence of the codons in Tables I through 
III are synthesized. The left half oligonucleotides also 
have 5 monomers at their 3' end of the sequence 5'GAGCT3' 

15 and 8 monomers at their 5' end of the sequence 
5'AATTCCAT3' . The rounds of synthesis, washing, drying, 
mixing, and dividing are as described above. 

For the first codon position, the synthesizer was 
fitted with a T-column and programmed to synthesize the 
20 sequences shown in Table IV for each of ten columns in 
independent reaction sets. As with right half synthesis, 
the sequence of the last three monomers (from right to 
left) encode the indicated amino acids: 
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Table IV 



10 







Sequence 






Cblmnn 




^ 5 » to 3 'J_ 


Ami no ACias 


colvunn 


1 T 

XJU 


AA f A/C) GAGCT 


Phe 


and Val 


column 








^ and Pro 


colusm 


3L 




Tyr 


and His 


column 


4L - 


AC (A/G) GAGCT 


Cys 


and Arg 


column 


5L 


CA(G/T) GAGCT 


Leu 


and Met 


column 


6L 


' CT(G/C) GAGCT 


Gin 


and Glu 


column 


7L 


AG (T/C) GAGCT 


Thr 


and Ala 


colximn 


8L 


AT (T/C) GAGCT 


Asn' 


and Asp 


column 


9L 


CC (A/C) GAGCT 


Trp 


and Gly 


column 


lOL • 


T(A/T)TGAGCT 


• lie 


and Cys 



Following washing and drying, the plugs for. each column 
15 were removed, mixed and aliquotted into ten new reaction 
coiuBns as described above. Synthesis of the. second, codon 
position was achieved using these ten columns containing 
the random mixture of reaction products from the first 
codon synthesis. The monomer coupling reactions for the 
20 second codon position are shown in Table V. 



25 



30 







Table V ' 








Sequence 


Apino Acids 


Column 




(5' to 3M 


col\imn 


IL 


AA(A/C)A 


Phe and Val 


columji 


2L 


AG ( A/G) Ik 


Ser and Pro 


columtn 


3L 


AT (A/G) A 


Tyr and His 


column 


4L 


AC (A/G) A 


Cys and Arg 


colximn 


5L 


CA(G/T)A 


Leu and Met 


column 


6L 


CT(G/C)A 


Gin and Glu 


column 


7L 


AG(T/C) A 


Thr and Ala 


column 


8L 


AT (T/C) A 


. Asn and Asp 


column 


9L 


CC(A/C)A 


Trp and Gly 


column 


lOL 


T(A/T)TA 


He and Cys 
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Again, randomization of the second codon - position was 
achieved by removing the reaction products from each of the 
columns and thoroughly mixing the beads. The beads were 
repacked into ten new reaction columns. 

Random synthesis of the next seven codon . positions 
proceeded identically to the cycle described above for the 
second codon position and again used the monomer sequences 
of Table V. After synthesis of the codon at position nine 
and mixing of the reaction products, the material was 
divided and repacked into 40 different columns and the 
monomer sequences shown in Table VI were coupled to each of 
the 40 columns in independent reactions. 



Table VI 



15 



20 



25 



30 



35 



Column 




c;oqn*j»nce ^5' to 3_!_1 


column 


IL 


AATTCCATAAAAXXa 


column 


2L 


AATTCCATAAACXXa 


column 


3L 


AATTCCATAACAX3CA 


column 


4L 


AATTCCATAACCXXA 


column 


5L 


AATTCCATAGAAXJCA 


column 


6L 


AATTCCATAGACXXa 


column 


7L 


AATTCCATAGGAXXA 


column 


8L 


AATTCCATAGGCXXA 


column 


9L 


AATTCCATATAAXXa 


column 


lOL 


AATTCCATATACXXA 


column 


IIL 


AATTCCATATGAXX& 


column 


12L 


AATTCCATATGCXX^ 


column 


13L 


AATTCCATACAAXX& 


coliimn 


14L 


AATTCCATACACXX^ 


column 


15L 


AATTCCATAGGAXXA 


column 


16L 


AATTCCATAGGCXXA 


coliunn 


17L 


- AATTCCATCAGAXX& 


column 


18L 


AATTCCATCAGCXXA 


column 


19L 


AATTCCATCATAXXA 


column 


20L 


AATTCCATCATCXX& 
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10 



15 



20 



column 


21L 


AATTCCATCTGAXXA 


column 


22L 


AATTCCATCTGCXX^ 


coltimn 


23L 


AATTCCATCTGAXXA 


column 


24L 


AATTCCATCreCXXA 


coliimn 


25L 


AATTCCATAGTAXX& 


column 


26L 


^TTCCATAGTCXXA 


column 


27L 


AATTCCATAGCAXXSl 


colximn 


28L ' 


AATTCCATAGCCXXA 


column 


29L 


AATTCCATATTAXX& - 


column 


30L 


AATTCCATATTC5DCA 


column 


31L 


AATTGCATATCAXX& 


column 


32L 


AATTGCATATCCXXA , . 


colvuan 


33L 


AATTCCATCCAAXX& 


column 


34L 


AATTCCATCCAOIPCA 


column 


35L 


AATTCCATCCCAXX& 


column 


36L. 


AATTCCATCCCCXXA 


column 


37L 


AATTCCATTATAXX^ 


column 


38L 


AATTCCATTATCXXA 


column 


39L 


AATTCCATTTTAXX^ 


column 


40L 


AATTCCATTTTCXX^ 



25 



30 



first «c „ono.ers denote. W an "X.. represent an^equal 
Mxture of all four nucleotides at that posrtion. Thxs 
"e^ssary to retain, a relatively unbiased, codon 

"nSion .et„een ri,ht and left half o^^°^^^ 
Z al>ove ri,ht and left half rando. o^^^-^^^^^JJ^^ 
cleaved and purified from the supports and used 
=o"tructin, the surface egression libraries heXow. 

TWO K13-based vectors, M13IX42 (SEQ ID NO: 1) and 
„13IX^ (SEQ ID ho: 2), were constructed for tl.. donxng 
M13IX22 (s « ^ populations of 

ra^n:^—- respectively, .he vectors. ere 

:::ciauy constructed to fac ilitate«. ^o. ,o...and 

subsequent expression of rxght 
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oligonucleotide populations. Each vector withxn the 
population contains one right and one left half 
oligonucleotide from the population joined together to form 
■ a single contiguous oligonucleotide with random codons 
5 which is twenty-two codons in length. The resultant- 
population of vectors are used to construct a surface 
expression library. 

M13IX42, or the right-half vector, was constructed to 
harbor the right half populations of randomized 

10 oligonucleotides. M13mpl8 (Pharmacia, Piscataway, NJ) was 
the starting vector. This vector was genetically modified 
to contain, in addition to the encoded wild type M13 gene 
VIII already present in the vector: (1) a pseudo-wild type 
M13 gene VIII sequence with a stop codon (amber) placed 

15 between it and an Eco Rl-Sac I cloning site for randomized 
oligonucleotides; (2) a pair of Fok I sites to be used for 
joining with M13IX22, the left-half vector; (3) a second 
amber stop codon placed on the opposite side of the vector . 
than the portion being combined with the left-half vector; 

20 and (4) various other mutations to remove redundant 
restriction sites and the amino terminal portion of Lac Z. 

The pseudo-wild type M13 gene VIII was used for 
surface expression of random peptides. The pseudo-wild 
type gene encodes the identical amino acid sequence as that 
25 of the wild type gene; however, the nucleotide sequence has 
been altered so that only 63% identity exists between this 
gene and the encoded wild type gene VIII. Modification of 
the gene VIII nucleotide sequence used for surface 
expression reduces the possibility of homologous 

30 recombination with the wild type gene VIII contained on the 
same vector. Additionally, the wild type M13 gene VIII was 
retained in the vector system to ensure that at least some 
functional, non-fusion coat protein would be produced. The 
inclusion of wild type gene VIII therefore reduces the 

35 possibility of non-viable phage production from the random 
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peptide fusion genes. 

TKe pseudo-wild , typ^ gene VIII was constructed by 
• c.e.icaixy synthesizing , series of pii,o«ucl~ti«»^^. 

encode both strands of the gene. The oligonucleotides are 
5 . presented in .Table VJI (SECID. NOS:; 7 through_ 16) ., . . 

T|e)RT.E VII 



10 



15 




Top Strand. 
Qliaonucleotideg 



VIII 03 
VIII 04 
VIII , 05 
VIII 06 
VIII 07 



sp quence f*^' to 3*) 

GATCC TAG GCT GAA GGC GAT 

GAC CCT GCT AAG GCT GG - 

A TTC AAT AGT TTA CAG GCA 

AGT GCT ACT GAG TAC A 

TT GGC TAC GCT TGG GCT ATG 

GTA GTA GTT ATA GTT 

GGT GCT ACC ATA GGG ATT AAA 

TTA TTC AAA AAG: TT 

T ACG AGC AAG GCT TCT TA 



20 



25 



Bottom Strand 
ni iaonuf j^eotides 

VIII 08 

VIII 09 

VIII 10 

VIII 11 

VIII 12 



AGC TTA AGA AGC CTT GCT CGT 

AAA CTT TTT GAA TAA TTT 

AAT CCC TAT GGT AGC ACC AAC 

TAT; AAC TAC TAC CAT 

AGC CCA AGC GTA GCC AAT GTA 

CTC AGT AGC ACT TG 

C CTG TAA ACT ATT GAA TGC 

AGC CTT AGC AGG GTC. 

ATC GCC TTC AGC CTA G 



30 



Except for the terminal oligonucleotides VIII 03 . (SEQ 



wo 92/06176 



PCT/US9 1/07 141 



39 

ID NO-: 7) and VIII 08 (SEQ ID NO: 12), the above 
oligonucleotides (oligonucleotides VIII 04-YIII 07 and 09- 
12 (SEQ ID NOS: 8 through 11 and 13 through 16)) were mixed 
at 200 ng each in 10 Ml final volume and phosphorylated 
5 with T4 polynucleotide Kinase (Pharmacia, Piscataway, NJ) 
with 1 mM ATP at 37-C for 1 hour. The reaction was stopped 
at 65 "C for 5 minutes. Terminal oligonucleotides were 
added to the mixture and annealed into double-stranded form 
by heating to 65'C for 5 minutes, followed by cooling to 

10 room temperature over a period of 30 minutes. The annealed 
oligonucleotides were ligated together with 1. 0 U of T4 DNA 
ligase (BRL) . The annealed and ligated oligonucleotides 
yield a double-stranded DNA flanked by a Bam HI site at its 
5' end and by a Hind III site at its 3 • end. A 

15 translational stop codon (amber) immediately follows the 
Bam HI site. The gene VIII sequence begins with the codon 
GAA (Glu) two codons 3' to the stop codon. The double- 
stranded insert was phosphorylated using T4 DNA Kinase 
(Pharmacia, Piscataway, NJ) and ATP (10 mM Tris-HCl, pH 

20 7.5, 10 mM MgClz) and cloned in frame with the Eco RI and 
sac I sites within the M13 poly linker. To do so, M13npl8 
was digested with Bam HI (New England Biolabs, Beverley, 
MA) and Hind III (New England Biolabs) and combined at a 
molar ratio of 1:10 with the double-stranded insert. The 

25 ligations were performed at 16 'C overnight in IX ligase 
buffer (50 mM Tris-HCl, pH 7.8, 10 mM MgCl^, 20 mM DTT, 1 mM 
ATP, 50 Mg/Bil BSA) containing 1.0 U of T4 DNA ligase (New 
England Biolabs) . The ligation mixture was transformed 
into a host and screened for positive clones using standard 

30 procedures in the art. 

several mutations were generated within the right-half 
vector to yield functional M13IX42. The mutations . were 
generated using the method of Kunkel et al . , Meth. EnzmoL 
154:367-382 (1987), which is incorporated herein by 
35 reference, for site-directed mutagenesis. The reagents, 
strains and protocols were obtained from a Bio Rad 
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. Mutagenesis kit (Bio Rad, Richmond. CA) and mutagenesis was 
performed as .recommended by the manufacturer ; ; 

A Fok I site used for joining the right and left 
halves was generated 8 nucleotides 5 V to the unique Eco RI 

- 5 site using the- -oligonucleotide. - .5J -.CTCGAATTCGTACATC^^ 
GGTCATAGC-3 « (SEQ ID NO: 17) . The second Fok I site 
retained in the vector is naturally encoded at positicn 
3547; however , the sequence within the overhang was changed, 
to encode CTTC. .• Two Fok I sites were removed from, the 

10 vector at positions 239 and 7244 of M13mpl8 as well as the . 
Hind 111 site at the end of the pseudo gene VIII sequence 
using the mutant oligonucleotides 5 ' -CATTTTTGCAGATGGCTTIGA 
-3 ' (SEQ ID NO: 18) and 5 • -TAGCATTAACGTCCAATA-3 • (SEQ ID 
no: 19) , respectively. New Hind III and Mlu I sites were 

15 also introduced at position 3919 and. 3951 of M13IX42. The 
oligonucleotides used for this mutagenesis had the 
sequences 5' -ATATATTTTAGTAAGCTTCATCTTCT-3' (SEQ ID NO: 20) 
^j^jj 5t-GACAAAGAACGCGTGAAAACTTT-3' (SEQ ID NO: 21), 
respectively. The amino terminal portion of Lac Z vas 

2 0 deleted by oligonucleotide-directed mutagenesis using the 
m u t a n t o 1 i g o n u c 1 e o t i d e 5 ' - 
GCGGGCCTCTTGGCTATTGCTTAAGAAGCCTTGCT-3 ' (SEQ ID NO : 22 ) . 
This deletion also removed a third M13mpl8 derived Fok I 
site. The distance between the Eco RI and Sac I sites was 

25 increased to ensure complete double digestion by inserting 
a spacer sequence. The spacer sequence was inserted using 
t h e o 1 i g o n u c 1 e o t i d e 5 
TTCAGCCTAGGATCCGCCGAGC?rCTCCTACCTGCGAATTCGTACATCC-3 ' ( SEQ ID 

no: 23). Finally, an amber stop codon was placed at 
30 position 4492 using the mutant oligonucleotide 5'- 
TGGATTATACTTCTA AATAATGGA-3 ' (SEQ ID NO: 24). The amber 
Stop codon is used as a biological selection to ensure the 
proper recombination of vector ..sequences to bring together 
right and left halves of the randomized oligonucleotides. 
35 In constructing the above mutations, all changes made in a 
M13 coding region were performed: such that ; the aminq acid 
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sequence , remained unaltered. It should be noted that 
several mutations within M13mpl8 were found which differed 
from the- published sequence. Where known, these sequence 
diiferehces are recorded herein as 'found and therefore may. 
5 not correspond exactly to the published sequence of 
M13mpl8. 

The sequence of the resultant vector, M13IX42, is 
shown in Figure 5 (SEQ ID NO:. 1) . Figure 3A also shows 
M13IX42 where each of the elements necessary for producing 

10 a'^surface expression library between right and left half 
randomized oligonucleotides is marked. The sequence 
between the two Fok I sites shown by the arrow is the 
portion of M13IX42 which is to be combined with a portion 
of the left-half vector to produce, random oligonucleotides 

15 as fusion proteins of gene VIII. 

M13IX22, or the left-half vector, was constructed to 
harbor the left half populations of randomized 
oligonucleotides. This vector was constructed from M13mpl9 
(Pharmacia, Piscataway, NJ) and contains: (1) Two Fok I 

20 sites for mixing with M13IX42 to bring together the left 
and right halves of the randomized oligonucleotides; (2) 
sequences necessary for expression such as a promoter and 
signal sequence and translation initiation signals; (3) an 
ECO Rl-sac I cloning site for the randomized 

25 oligonucleotides; and (4) an amber stop codon for 
biological selection in bringing together right and left 
half oligonucleotides. 

Of the two Fok I sites used for mixing Mi3IX22 with 
M13IX42, one is naturally encoded in M13mpl8 and M13mpl9 
30 (at position 3547) . As with M13 1X4 2, the overhang within 
this naturally occurring Fok I site -was changed to CTTC . 
The other Fok I site was introduced after construction of 
the translation initiation signals by site-directed 
mutagenesis using the oligonucleotide 5'- 
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10 



IAACACTCiTTCCGGATGGAATICTGCAGTCrG3GT-3' (SEQ ID NO: 25) . 

..e translation initiation signals ^^ ^^^'^^^ 

anneaXin, of ove.Xappin, '^^^^^^^'T^^:^^. 
,hove to produce a double-stranded insert containing 
above to pr ^ overlapping 

ECO- -RI site .-and ..a . 3.. Hind ^. ^qs: 26^ 

oligcnuclectidesaresno^inTable^^^^^^^^^^^^^ 

through 34) and were ligated as a douDi ^ 

.etween the- Eco PI and Hind XIX sites ""^PX^ - 

described for the pseudo gene VIII insert. The rib _ 

bi^II^ site (.GG.GAC) is Xcated in ""^--^^"""'^^ 
< EQ ID ho: 26) ana the translation i"^^*^.'^,^^ 
Is the first three nucleotides of oXigonucXeotlde 016 (SEQ 



ID NO: 27) . 

y ; >RT.E VIII 



15 



20 



015 
016 

017 

018 



'■' AATT C GCC AAG GAG ACA GTC AT 

AATG AAA TAC CTA TTG CCT ACG GCA 
GCC GCT GGA TTG TT • 
ATTA CTC GCT GCC CAA CCA GCC AIG 
GCC G^G CTC GTG AT 
GACC CAG ACT CCA GATATC CAA CAG 
GAA TGA GTG TTA AT 
niq ■ TCT AGA ACG CGT C 

ACGT G ACG CGT TCT AGA AT TAA . 
CACTCA TTC CTG T 

TG GAT ATC TGG AGT CTG GGT CAT 
' CAC GAG CTC GGC CAT G 

GC TGG TTG GGC AGC GAG TAA TAA 
CAA TCC AGC GGC TGC C 
GT AGG CAA TAG GTA TTT CAT TAT 
' GAC TGT CCT TGG CG 



021 

30 . 022 

023 
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Oligonucleotide 017 (SEQ ID NO: 27) contained a Sac I 
restriction site 67 nucleotides downstream from the ATG 
codon. The naturally occurring Eco RI site was removed and 
a new site introduced 25 nucleotides downstream from the 
5 sad. Oligonucleotides 5- -TGACTGTCTCCTTGGCGTGTGAAATTGTTAt 
3- (SEQ ID NO:' 35) and-"5 • -TAACACTCATTCCGGATGGAATTCTGGAGTCT 
GGGT-3' (SEQ ID NO: 36) were used to generate each of the 
mutations, respectively. An amber stop codon was also 
introduced at position 3263 of M13mpl8 using the 
10 oligonucleotide 5,' -CAATTTTATCCTAAATCTTACCAAC-3 • (SEQ ID NO: 

37) . . 

In addition to the above mutations, a variety of other 
modifications were made to remove certain sequences and 
redundant restriction sites. The LAC Z ribosome binding 
15 site was removed when the original Eco RI site in M13mpl8 
was mutated. Also, the Fok I sites at positions 239, 6361 
and 7244 of M13mpl8 were likewise removed with mutant 
oligonucleotides 5 • -CATTTTTGCAGATGGCTTAGA-3 ' (SEQ ID NO: 

38) , 5'-CGAAAGGGGGGTGTGCTGCAA-3' (SEQ ID NO: 39) and-5'- 
20 TAGCATTAACGTCCAATA-3 • (SEQ ID NO: 40), respectively. 

Again, mutations within the coding region did not alter the 
amino acid sequence. 

The resultant vector, M13IX22, is 7320 base pairs in 
length,, the sequence of which is shown in Figure 6 (SEQ ID 
25 NO: 2). The Sac I and Eco RI cloning sites are at 
positions 6290 and 6314, respectively. Figure 3A also 
shows M13IX22 where each of the elements necessary for 
producing a surface expression library between right and 
left half randomized oligonucleotides is marked. 

30 Library mr^gtruction 



Each population of right and left half randomized 
oligonucleotides from columns IR throiigh 40R and columns IL 
through 40L are cloned separately into M13IX42 and M13IX22, 



pcr/us9i/(m'ii 



.^.ctivelv to create subllbraries of right and Uft half 
: i: oU,onucXeotiae.. therefore, a tota.^o. ei,W 

r^U.raries are generated , s^t^y -"^rrT*^ ' 

■ Lulation 01 randomized oligonucleotides until the .final . 
S "e": step is per.or.ed to ensure »a.i.u. 

annealing- of right and left half . oligonucleotides. ^ ;The ^ 
;^::er efficiency increases the total hu^her of randomized 
:lIgonucleotides which can he ohtained. «tematively.on , 
coBbine all forty populations of right -half. 
10 otigonucleotides icolu^s 1R-40R) into one population and 
. left half Oligonucleotides (colu^s 1L-40L, into a 
second population to generate just one suhlihrary for each. 

For the generation of suhlibraries, each of the above. 

populations of , randomized oligonucleotides are ^cloned 

15 Separately into the appropriate vector. The right half 

rigonucleotides are cloned into HI3IX42 to gener t 

: sublibraries K13IX42.1R through HUIX42.40«. "^^^^^ 
oligonucleotides are similarly Cloned into H13IX22^ to 

generate sublibraries M13IX22.1L through H13IX22.40I,. Each 
,0 Tector contains unique Eco HI and Sac I restriction enzyme 
s!Ls Which produce. 5. and 3' single-stranded overhangs. 
• respectively, when digested. The single strand overha^s , , 
. are used for the annealing and ligation . of . the 
complementary single-stranded random oligonucleotides, 

„ The randomized. Oligonucleotide populations are. Cloned 

between the Eco RI and Sac I sites by sequential digestion 
a^d ligation steps. Each vector is treated with an .excess 
Of ECO BI C«ew England Biolabs) at 37 -C for- 2 hours 
/ollowed by addition of 4-24 units of calf "^^^^ 
30 alkaline phosphatase (Bpehringer Mannheim, Indianapolis, 
T \.Zio.s are stopped by phenol/chlorpform e^ctraction 
.„d etnanol precipitation. The pellets- are resuspended^- 
„ appropriate amount of distilled or deionized water 
%I i^bout lO pmol df vector is mixed with a -5000- old 
35 molar excess of each, population of randomized. 
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oligonucleotides in 10 Ml of IX ligase buffer (50 mM Tris- 
HCl, pH 7.8, 10 mM MgClj, 20 nM DTT, 1 BiM ATP, 50 Hg/ml BSA) 
containing 1.0 U of T4 DNA ligase (BRL, Gaithersburg, MD) . 
The ligation is incubrted at 16*C for 16 hours. Reactions 
5 are stopped by heating at 75 'C for 15 minutes and the. DNA 
is digested with "ah excess of Sac I (New England .Biplabs) 
for 2 hours. Sac I is inactivated by heating at 75 'C for 
15 minutes and the volume of the reaction mixture is 
adjusted to 300 jil with an appropriate amount of lOX ligase 
10 buffer and dHjO. One unit of T4 DNA ligase (BRL) is added 
and the mixture is incubated overnight at 16 'C. The DNA is 
ethanol precipitated and resuspended in TE (10 mM Tris-HCl, 
pH 8.0, 1 mM EDTA) . DNA from each ligation is 
electroporated into XLl Blue^" cells (Stratagene, La Jolla, 
15 CA) , as described below, to generate the sublibraries. 

E. coli XLl Blue" is electroporated as described by 
Smith et al., Focus 12:38-40 (1990) which is incorporated 
herein by reference. The cells are prepared by inoculating 
a fresh colony of XLls into 5 mis of SOB without magnesium 

20 (20 g bacto-tryptone, 5 g bacto-yeast extract, 0.584 g 
NaCl, 0.186 g KCl, mp to 1,000 mls) and grown with 
vigorous aeration overnight at 37 'C. SOB without magnesium 
(500 ml) is inoculated at 1:1000 with the overnight culture 
and grown with vigorous aeration at 37 'C until the OD550 is 

25 0.8 (about 2 to 3 h) . The cells are harvested by 
centrifugation at 5,000 rpm (2,600 x g) in a GS3 rotor 
(Sorvall, Newtown, CT) at 4*C for 10 minutes, resuspended 
in 500 ml of ice-cold 10% (v/v) sterile glycerol and 
centrifuged and resuspended a second time in ' the same 

30 manner. After a third centrifugation, the cells are 
resuspended in 10% sterile' glycerol at a final volume of 
about 2 ml, such that the OD550 of the suspension is 200 to 
300; usually,'resuspension is achieved in the 10% glyceroL- 
t hat remains in the bottle after pouring off the supernatei 

35 Cells are frozen in 40 ^1 aliquots in microcentrifuge tubes 
using a dry ice-ethanol bath and stored frozen at -70'C. 
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, . Frozen ceXl= are .lec:«opo«ted -^y ^ 
ice .e.ore.u3e ana .i.in, «it. a.o.. X. ^ ^ ^ 
. vector p«r 40 ^1 of cell suspension. * " ."^^ -'f^,, 
Dlaoed m an 0.1 c» eleotroppration cnaBoer . (Bio Rao 
. c..:an. pulse, once at O-C usXn, paraU 

- - resistor, ■2b ;<F. ;i.s8-,KV-,- Which, gives, a.pulse ^fW^J^ 
-4„s. A 10 M aU,uot of the pulsed cells are diluted 

into X .1 SOC <S...1S SOB Pius X .1 Of 2 M 

2 M glucose) in a X2- x culture tube, and the culture 

10 is^ Iha-cen at 37-C for 1 P"- ^° " 

-selective media, (see below). 

Each Of the eighty sublibraries are cultured using 
methods known to one sWUed in the art. Such methods can 
be found in sanbrook. et al. , . Holecular cloning.^ A 
la Moratory Manuel, cold Sp.ing Harbor If , 
spring Harbor, 19.9, and in Ausubel et 
, .^ctocols in Molecular Biology, .ohn Wiley and Sons Ke„ 

York 1989, both of which are incorporated herein by 
. rlfLnce. Briefly, the ab6ve X .X sublibrary -Iture 

~ irs.^"r' '.^e .eueted 

cintrifugation at 10,000 xg.. The supernatant . contain^g 
^hage, was transferred W a sterile tube and stored at. 4 C. 

,5 Double strand vector DNA containing right : 

balf randomized Oligonucleotide inserts is 

cell pellet of each sublibrary. -^-"^'^^^^^^^l 
washed in TE (10 ^ Tris, pH 8.0, X ^ J-^N ^ 
recollected by centrifugatlon at 7,000 rpm for 5 in 
30 Cai centrifuge (Kewtown, <rr, ■ Pellets are resuspenled 

L e BIS Of 10» sucrose, 50 ^ -^-P"-"- ^J^''^^' 
• / 1 i\,«orvne is added and incubated on ice lor 
lis 12^1^ 0.2 M HaOK, .1% SOS is added followed by 
on ice. «» suspensions a.e then incuMted^on 
35 ice for 20 minutes after addition of 7.5 mis of 3 M HaOAc, 



wo 92/06176 



PCr/DS9l/07141 



47 



pH 4.6. The samples are centrifuged at 15,000 rpm for 15 
minut s at .4-C, RNased and extracted with 
phenol/chloroform, followed by ethanol precipitation. The 
pellets are resuspended, weighed and an equal weight of 
5 CSCI2 is dissolved into each tube until a density of 1.60 
g/ml is achieved. EtBr is added to 600 ng/ial and the 
double-stranded DNA is isolated by equilibriua 
centrifugation in a TV-1665 rotor (Sorval) at 50,000 rpa 
for 6 hours. These DNAs from each right and left half 
10 sublibrary are used to generate forty libraries in which 
the right and left halves of the randomized 
oligonucleotides have been randomly joined together. 

Each of the forty libraries are produced by joining- 
together one right half and one left half sublibrary. The ^ 

15 two sublibraries joined together corresponded to the same | 
column number for right and left half random;; 
oligonucleotide synthesis. For example, sublibrary 

M13IX42.1R is joined with .:n3IX22.1L to produce the surface ^ 
expression library M13IX.1RL. In the alternative situation:- 

20 where only two sublibraries are generated from the combined ; 
populations of all right half synthesis and all left half ^ 
synthesis, only one surface expression library would be 
produced . 

For the random joining of each right and left half 
25 oligonucleotide populations into a single surface 
expression vector species, the DNAs isolated from each 
sublibrary are digested an excess of Fok I (New England 
Biolabs). The reactions are stopped by phenol/chloroform 
extraction, followed by ethanol precipitation. Pellets are 
30 resuspended in dH^O. Each surface expression library is 
generated by ligating equal molar amounts (5-10 pmol) of 
. Fok I -digested DNA isolated from corresponding right and 
left half sublibraries in 10 Ml of IX ligase buffer 
containing 1.0 U of T4 DNA ligase (Bethesda Research 
35 Laboratories, Gaithersburg, MD) . The ligations proceed 
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overnight at 16-c and ar. eXsctroporated into the sup 0 
strain HK30-3 (Boehringer Mannheim: Biochemical (BMB) , . , 
inaianapolis, IH, as previously described for XLl celU^. 
Because HK30-3 is sup o, only the vector portions , enccdin, , 

5 the randomized oligonucleotides which come together .will . 

- - produce viable. phage.,. , _ 

c^ ^^^T^ina -'^nT-faee E^nnrpsqinn T.ibrarj^s 

purified phage are prepared from 50 lal liquid cultures 
Of ..XLl Blue- cells XStratagene) which are infected at^a 
10 m o i of 10 from the phage stocks stored at 4 C. The 
cultures are induced with 2 mM IPTG. Supernatants fro^. all 
cultures are combined and cleared by two centrifugatxons 
and the phage are precipitated by adding 1/7,5 volumes of 

soirtio'n (.5% PBa-aooo, ..5 ^ 

X5 incubatidn at 4' C overnight. The precipitate xs recovered 
by centrifugation for 90 minutes at XO.OOO x g «.a^ , 

il^ts are resuspended" in 25 ml of 0.01 M Trxs-HCl, pH 
r To^^A, and 0.1% S.rKosyl and then shaken slowly 
1; roo. temperature for 30 minutes. The solutions ^are 
,0 adjusted to 0.5 M Naci and to a final concentratxon of^ 

-I T M^i-er 2 hours at 4''C, tne 

polyethylene glycol ^ .^^ ^covered ,hy 

- precipitates containing the phage are . 
LtrLugation .or 1 hour at X5,000 X 

are resuspended in 10 ml of HBT buffer (0.1 M HaCl, 1.0 «H 
,s EM*, and 0.01 H tris-HCl, pH 7.6), mixed well,, and the 
P^t repelleted by centrifugation at 170,000. X g for 3 
hours The phage pellets are subsequently resuspended . 

• >, J in 2 ml of »ET buffer and subjected to cesium 
Illr^rc^tri^gation for IS hour, at 110,000 X g ,3^e 
• .>,io-ride in 10 ml of buffer) . Phage bands are 

X70,000 x g for3 hours, resuspended, f ^ = 

0.3 ml of HET buffer containing 0.1 mH sodium a.xde. 

Ugand binding P«teins used for panning , on 
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streptavidin coated dishes are first biotinylated and then 
absorbed against UV-inactivated blocking phage (see below) . 
The biotinylating reagents are dissolved in 
dimethyl fonflamide at a ratio of 2.4 ing solid NHS-SS-Biotin, 
5 (sulfosuccinimidyl 2 - ( b'ict inaraido ) ethyl -1 ,3 ' - 
dithiopropionate; Pierce, Rockford, XL) to 1 ml solvent and 
used as recoimnended by the manufacturer. Small-scale 
reactions are accomplished by mixing 1 /il. dissolved reagent 
with 43 /il of 1 mg/ml ligand binding protein . diluted in 
10 sterile bicarbonate buffer (0.1 M NaHCOj, pH 8.6). After2 
hours at 25' C, residual biotinylating reagent is reacted 
with 500 Ml 1 M ethanolamine (pH adjusted to 9 with HGl) 
for an additional 2 hours. The entire sample is diluted 
with 1 ml TBS containing 1 mg/ml BSA, concentrated to about 
15 50 /il on a Centricon 30 ultra-filter (Amicon) , and washed 
on the same filter three times with 2 ml TBS and once with 
1 ml TBS containing 0.02% NaNj and 7 x 10^^ UV-inactivated 
blocking phage (see below) ; the final retentate (60-80 m1) 
is stored at 4''C. Ligand binding proteins biotinylated 
20 with the NHS-SS-Biotin reagent are linked to biotin via a 
disulf ide-containing chain. 

UV-lrradiated M13 phage were used for blocking binding 
proteins which fortuitously bound filamentous phage in 
general. M13mp8 (Messing and Vieira, Gene 19: 262-276 
25 (1982), which is incorporated herein by reference) was 
chosen because it carries two amber stop codons , which 
ensure that the few phage surviving irradiation will not 
grow in the sup 0 strains used to titer the surface 

13 

expression libraries. A 5 ml sample containing 5 x 10 
30 M13mp8 phage, purified as described above, was placed in a 
small petri plate and irradiated with a germicidal lamp at 
a distance of two feet for 7 minutes (flux 150 /xW/cm^) . 
NaNj was added to G.02% and phage particles concentrated to 
10^^ particl s/ml on a Centricon 30-kDa ultrafilter 
3 5 (Amicon) . 
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. For panning, polystyrene petri plates (60 x 15 rm, 
Falcon; Becton Dickinson, Lincoln Park, NJ) are incubated 
with 1 ml of 1 mg/ml of streptavidin (BMB) in 0.1 M NaHCOs; 
pK 8.6-0.02%. Nal^3 in a small, air-tight plastic box. 
5 overnight in a cold room. The next day streptavidin is- 

removed and- -replaced with: at ie^^^^^^ 

(29 mg/ml of BSA; 3 Mg/ml. of streptavidin; 0.1 M NaHCOj pH 
8.6-0.0,2% NaN3) .. and; incubated at least 1 hour " at room 
temperature. The blocking' solution is " removed and plates ■ 
10 are washed rapidly 'three times with Tris buffered ' saline 
. containing 0.5% Tween 20 (TBS-0.5% Tween 20) . 

Selection of .phage expressing peptides bound by the 
ligand binding proteins is performed with 5 ^ll {2.1 m 
: iigand binding' protein) of blocked biotinylated ligand 

15 binding, proteins, reacted with a 50 ^1 portion of each 
library. Each .mixture is incubated overnight at 4-C, 
diluted with 1 ml TBS-O. 5% Tween 20, and transferred to a 
streptavidin-coated petri plate prepared as described 
above. . After rocking 10 minutes at room temperature, 

20 unbound phage are removed and plates washed ten times with 
TBS-0.5% Tween 20 over a period of 30-90 minutes. Bound 
phage are eluted from plates with 800 Ml sterile elutiqn 
buffer (1 mg/ml BSA, 0.1 M HCl, pH adjusted to 2.2 with 
glycerol) for 15 minutes and" eluates neutralized with 48 Ml 

25 2 M Tris (pH unadjusted). A 20 Ml portion of each eluate 
is titered on MK30-3 concentrated cells with dilutions of 
input phage . 

A second round of panning is performed by treating 750 
Ml of first eluate from each library with 5 mM DTT for 10 

30 minutes to break disulfid.e bonds linking biotin groups to 
residual biotinylated binding proteins. The treated eluate 
is concentrated on a Centricon 30 ultraf ilter (Amicon) , 
washed three times with TBS-O; 5% Tween 20, and concentrated 
to a final volume of about 50 Ml- Final retentate is 

35 transferred to a tube containing 5.0 Ml (2-7 Mg ligand 
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binding protein) blocked biotinylated ligand binding 
proteins and incubated overnight. The' solution is diluted 
with 1 ml TBS-0.5% Tween 20, panned, and eluted as 
described above on fresh streptavidin-coated petri plates. 
5 The entire second eluate (800 nl) is neutralized with 48. /il 

2 M Ti-is, and 20 /xl is titered simultaneously with, the 
first eluate and dilutions of the input phage. 

Individual phage populations are purified through 2 to 

3 rounds of plague purification. Briefly, the second 
10 " eluate titer plates are lifted with nitrocellulose filters 

(Schleicher & Schuell, Inc., Keene, NH) and processed by 
washing for 15 minutes in TBS (10 mM Tris-HCl, pH 7.2, 150 
mM NaCl) , followed by an incubation with shaking for an 
additional 1 hour at 37 'C with TBS containing 5% nonfat dry 

15 milk (TBS-5% NDM) at 0.5 ml/cm^ The wash is discarded and 
fresh TBS-5% NDM is added (0.1 ml/cm^) containing the ligand 
binding protein between 1 nM to 100 mM, preferably between 
1 to 100 nn. All incubations are carried out in heat- 
sealable pouches (Sears) . Incubation with the ligand 

20 binding protein proceeds for 12-16 hours at 4°C with 
shaking. The filters are removed from the bags and washed 
3 times for 30 minutes at room temperature with 150 mis of 
TBS containing 0.1% NDM and 0.2% NP-40 (Sigma, St. Louis, 
MO). The filters are then incubated for 2 hours at room 

25 temperature in antiserum against the ligand binding protein 
at an appropriate dilution in TBS-0.5% NDM, washed in 3 
changes of T^ containing 0.1% NDM and 0.2% NP-40 as 
described above and incubated in TBS containing 0.1% NDM 
and 0.2% NP-4d with .1 x 10* cpm of ^^I-labeled Protein A 

30 (specific activity = 2.1 x 10^ cpm/^xg) . After a washing 
with TBS containing 0.1% NDM and 0.2% NP-40 as described 
above, the filters are wrapped in Saran Wrap and exposed to 
Kodak X-qmat x-ray film (Kodak, Rochester, NY) tor 1-12 
hours at -70 'C using Dupont ' Cronex Lightning Plus 

35 Intensifying Screens (Dupont, Willmington, DE) . 
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. -positive plaques identified are cored with the large, 
end Of a pasteur pipet and placed into .1 ml of SM- {5.,8 | 
NaCl, 2 g MgSO,.7H3o/50 ml 1 M Tris-HCl, pH 7.5, 5 mls 2% 
gelatin, to lOOO mis with dK,0) plus 1-3 drops of CHCI3 ana 
5 incubated at.37-C 2-3 hours or overnight at 4-C. The phage 
- - are diluted- IrSOO in: SM -and.^ 9^ ^' 

cells plus 3 mis. of soft agar per 100 mm plate. The XLl 
cells are prepared for plating by growing a colony 
overnight in 10 ml LB (10 g bacto-tryptone , 5 g bacto-yeast 
10 extract, 10 g NaCl, 1000 ml dH^O) , containing 100 Ml of 20% 
. maltose and 100 Ml of 1 M MgSO,. The bacteria are pel letted 
by centrifugation at 20,00 xg for 10 minutes and the pellet 
is resuspended gently in 10 mis of 10 mM MgSO,. The 
suspension is diluted 4-fold by adding 30 mis of 10 mM MgSO, 
15 to give an OD^ of approximately 0.5. The second and third 
round screens are identical to that described above except 
that the plaques are cored with the. small end of a pasteur 
pipet . and placed into 0,5 mis SM plus a drop of CHCI3 and 1- 
5 Ml of the phage following incubation are used for plating 

20 without dilution. At the end of the. third round of 
purification, an individual plaque . is picked and the 
templates prepared for sequencing. 

Tp>p plate Preparat ion and Secpaencinq 

Templates are prepared for sequencing by inoculating 
25 a 1 ml culture of 2XYT containing a 1:100 dilution of an 
overnight culture of XLl with an individual plaque. The 
plaques are picked using a sterile toothpick. The culture 
is incubated at 37-C for 5-6 hours with shaking and then 
■ transferred to a 1.5 ml microfuge tube. 200 Ml of PEG 
30 solution is added, followed by vortexing and placed on ice 
for 10 minutes. The phage precipitate is recovered by 
centrifugation in a microfuge at 12,000 x g for 5 minutes. 
The supernatant is discarded and the pellet is resuspended. 
in 230 Ml of TE (10 mM Tris-HCl, pH 7.5, i mM EDTA) by 
35 gently pipeting with a yellow pipet tip. Phenol (200 mD 
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is added, followed by a brief vortex and microfuged to 
separate the phases. The aqueous phase is transferred to 
a separate tube and extracted with 2 00 /il of 
phenol/chloroform (1:1) as described above for the phenol 
extraction. A 0.1 volume of 3 M NaOAc is added, followed 
by addition of 2.5 volumes of ethanol and , precipated at 
-20 -C for 20 minutes. The precipated templates are 
recovered by centrifugation in a microfuge at 12,000 x g 
for 8 minutes. The pellet is washed in 70% ethanol , dried 
and resuspended in 25 /il TE. Sequencing was performed 
using a Sequenase^'* sequencing kit following the protocol 
supplied by the manufacturer (U.S. Biochemical, Cleveland, 

OH). 



FX AMPLE II 



15 j^rs^.^^nn and rh^rp^nterizat i on of Pept ide T.iqands Ggngyated 
PT-mn Qliaonneleotid A s Having Random Co<jons at Two 
Predetenn ^r"^ Positions 

This example shows the generation of a surface 
expression library from a population of oligonucleotides 

20 having randomized codons. The oligonucleotides are ten 
codons in length and are cloned into a single vector 
species for the generation of a M13 gene Vlll-based surface 
expression library. The example also shows the selection 
of peptides for a ligand binding protein and 

25 characterization of their encoded nucleic acid sequences. 

Ol iaonucl eot ide Synthesis 

Oligonucleotides were synthesized as described in 
Example I. The synthesizer was programmed to synthesize 
the- ^eqii he s shown' in Table IX. These sequences. 
30 correspond to the first random codon position synthesized 
and 3- flanking sequences of the oligonucleotide which 
hybridizes to the leader sequence in the vector. The 
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for insertional 
population of 



10 



15 



20 



complementary sequences are used 
mutagenesis of the synthesized 
oligonucleotides. 

• Table IX 



column • qprpiPncp fS' to 3') 

column 1 . AA(A/C)GGTTGGTCGGTACCGG. 

column 2 AG(A/G)GGTTGGTCGGTACCGG , 

column 3 AT(A/G)GGTTGGTCGGTACCGG 

column 4 AC(A/G)GGTTGGTCGGTACCGG 

columns CA(G/T) GGTTGGTCGGTACCGG 

column 6 • CT(G/C) GGTTGGTCGGTACCGG 

column 7 AG (T/C) GGTTGGTCGGTACCGG 

. column 8 AT (T/C) GGTTGGTCGGTACCGG 

column 9 CC (A/C) GGTTGGTCGGTACCGG 

. ., column 10 T(A/T)TGGTTGGTCGGTACCGG 

The next eight random codon positions were synthesized 
as described for Table V in Example I. Following the nxnth 
position synthesis, the reaction products were once more 
combined, mixed and . redistributed into 10 new reaction 
columns. Synthesis of the last random codon position and 
5« flanking sequences, are shown in Table X. 

Table X 



25 



30 



soiumn 

column 1 
column 2 
column 3 
colvimn 4 
colximn 5 
column 6 
colvimn 7 
column 8 
colvimn 9 
colvimn 10 



se quence f*^' to 3M 
AGGATCCGCCGAGCTCAA (A/C) h 
AGGATCCGCCGAGCTCAG (A/G) ^ 
AGGATCCGCCGAGCTCAT (A/G) A 
AGGATCCGCCGAGCTCAC ( A/G) A 
AGGATCGGCCGAGCTCCA(G/T) A 
AGGATCCGCCGAGCTCCT (G/C) h 
AGGATCCGCCGAGCTCAC (T/C)^ 
AGGATCCGCCGAGCTCAT (T/C) 
AGGATCCGCCGAGCTCCC ( A/C) S= 
AGGATCCGCCGAGCTCT (A/T) TA 
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The reaction products were mixed once more and the 
oligonucleotides cleaved and purif ied as recommended by the. 
manufacturer. The purified population of oligonucleotides 
were used to generate a surface expression library as 
5 described below. 

Vector Construction 

The vector used for generating surface expression 
libraries from a single oligonucleotide population (i.e., 
without joining together of right and left half 
10 oligonucleotides) is described below. The vector is a M13- 
based expression vector which directs the synthesis of gene 
Vlll-peptide fusion proteins (Figure 4). This vector 
exhibits all the functions that the combined right and left 
half vectors of Example I exhibit. 

15 An M13-based vector was constructed for the cloning 

and surface expression ' of populations of random 
oligonucleotides (Figure 4, M13IX30) , Ml3mpl9 (Pharmacia) 
was the starting vector. This vector was modified to 
contain, in addition to the encoded Wild type M13 gene 

20 vril: (1) a pseudo-wild type gene, gene VIII sequence with 
an amber stop codon placed between it and the restriction 
sites for cloning oligonucleotides; (2) Stu I, Spe I and 
Xho I restriction sites in frame with the pseudo-wild type 
gVIII for cloning oligonucleotides; (3) sequences necessary 

25 for expression, such as a promoter, signal sequence and 
translation initiation signals; (4) various other mutations 
to remove redundant restriction sites and the amino 
terminal portion of Lac Z. 

construction of M13IX30 was performed in four steps. 
30 -in the first step, a precursor vector containing the pseudo . 
gene VIII and various other mutations was constructed, 
M13IX01F. The second step involved th construction of a 
small cloning site in a separate M13mpl8 vector to yield 
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M13IX03 In the third step, expression seguences and, 
ioh!^a' sites -ere constructed in M13IX03 to generate, the 

t ^l:::: vector Mi3xxo.B. .curt, step i--^ - 

intermediate vector into M13IX01F to yi 
rnacrporation o, ^these. . se^en.es . UnKed the. ; with . the . 

pseudo gene VIII. 

construction o£ the precutsor vector M13IX01F was 
similar to that of HI3IX42 described in Example I excep 
-for the following features: ■ (D ««>npl9 was used as the 
starting vector; (a, the Po. I ^^^e -5 ■ to t^ ™ - 
BI site was not incorporated and the overhang at .the . 
neturally occurring Fo^ I sit. at position 3547 was . not . 
changed to s-™.; (3) the spacer secfaence was n t 
15 incorporated between the ECO EI and sac I sites; and (4) 
the aMber codon at position 4492 was not incorporated. 

in the second step, M13Bpl8 was ,.utated to remove the 
5. end of Lao Z up to the Lac i binding.site and including, 
the Lac Z ribosoBe binding site- and start codo,. 
20 Additionally, the polylinxer was removed and ^a 
wasintroduced intheooding region ofl.c z, 
oligonucleotide was. used for these mutagenesis and had tte 

:i;!ence "5 . -i^CCAC^TCCCAAOTCACOCCTCTC.^^^^ 

3.n (SEQ ID ho: 41) . Restriction enzyme sites for Hmd III 
35 and ECO . RI »ere introduced downstream^ of the Mlul site, 
using the oligonucleoti.ae. 

LcGAAALAATTCTGCAACGCGATTAAGCrTGCGTAACGCC-3 . " (SEQ ID NO: 
These modifications of «13mplS yielded the vector 

M13IX03. 

30 The expression sequences and cloning sites were 

introduced into M13IX03 by chemically synthesizing a seraes 
Of Oligonucleotides which encode both, ^^-^^r "^ 
desired sequence. The oligonucleotides are presented xn 
Table XI (SEQ ID NOS: 43 through 50) . " 
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TART.E XI 
Ml :^IX30 Oliaonuc l eotide Series 



10 



Top Strand 
Oligonucleotides 

084 

. 027 

028 
029 



Sequence ^5 ' to 3 ' ) 

GGCGTTAGCCAAGCTTTGTACATGGAGAAAATAAAG 

TGAAACAAAGCACTATTGCACTGGCACTCTTACCGT 
TACCGT 

TACTGTTTACCCCTGTGACAAAAGCCGCCCAGGTCC 
AGCTGC 

TCGAGTCAGGCCTATTGTGCCCAGGGATTGTACTAG 
TGGATCCG 



15 



Bottom 

Oligonucleotides 
085 
031 

032 

033 



20 



Sequence fS' to 3') . 
TGGCGAAAGGGAATTCGiGATCCACTAGTACAATCCCTG 

GGCACAATAGGCCTGACTCGAGCAGCTGGACCAGGGCG 
GCTT 

TTGTCACAGGGGTAAACAGTAACGGTAACGGTAAGTGT 
GCCA 

GTGCAATAGTGCTTTGTTTCACTTTATTTTCTCCATGT 
ACAA 



25 



30 



The above oligonucleotides except for the terminal 
oligonucleotides 084 (SEQ ID NO: 43) and 085 (SEQ ID NO: 
47) of Table XI were mixed, phosphorylated, emnealed and 
ligated to form a double stranded insert as described in 
Example I. However, instead of cloning directly into the 
intermediate vector the insert was first amplified by PGR 
using the terminal oligonucleotides 084 (SEQ ID NO: 43) and 
085 (SEQ ID NO: 47) , as primers. The terminal 
oligonucleotide 084 (SEQ ID NO: 43) contains a Hind III 
site 10 nucleotides internal to its 5* end. 
Oligonucleotide 085 (SEQ ID NO: 47) has an Eco RI site at 
its 5' nd. Following amplification, the products were 
restricted with Hind III and Eco RI and ligated as 
described in Example I into the polylinker of M13mpl8 
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10 



15 



20 



25 



30 



dlaested with the saiae two enzymes. The resultant double 
stranded insert contained a ribosoBS , binding site, a 
translation initiation oodon followed by a - leader sequence 
and three restriction enzyae sites for cloning random 
cl igonucleotides (Xho I ,. Stu I . Spe I ) . The vector was 
named M13IX04. 

During cloning of the double-stranded insert, it was 
found that one of the GCC codons in oligonucleotides 028 
and its complement in 031 was deleted. Since this deletion 
did not affect function, the final construct is missing one 
of the two GCC codons. Additionally, oligonucleotide 032 
contained a GTG codon where a GAG codon was needed. 
Mutagenesis was performed using the oligonucleotide 5-- 
TAACGGTAAGAGTGCCAGTGC-3 • (SEQ ID NO: 51) to convert the 
codon to the desired sequence. The resultant intermediate 
vector was named M13rX04B. 

The fourth step , in constructing M13IX30 involved 
inserting the expression and cloning sequences from 
M13IX04B upstream of the pseudo-wild type gVIII m 
M13IX01F. This was accomplished by digesting M13IX04B with 
Dra III and Bah HI and gel isolating the 700 base pair 
insert containing the sequences of interest. M13IX01F was 
likewise digested with Dra III and Bam HI . The insert was 
combined with the double digested vector at a molar ratio 
of 3:1 and ligated as described in Example I. It should be 
noted that all modifications in the vectors described 
herein wer^ confirmed by sequence analysis. The sequence 
of the final construct, M13IX30, is shown in Figure 7 (SEQ 
ID NO- 3) . Figure 4 also shows M13IX30 where each of the 
elements necessary for surface expression of randomized 
oligonucleotides is marked. - 
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T.^wr-.r^ cons ^- r;^>ion. S crppninq an d rhRr^ct.eri nation of 
Encoded Qliaonu cleotides 

construction of an M13IX30 surface expression library 
is accomplished identically to that described in Example I 
5 for sublibrary construction except the oligonucleotides 
described above are inserted into M13IX30 by mutagenesis 
instead of by ligation. The library is constructed and 
propagated on MK30-3 (BMB) and phage stocks are prepared 
for^ infection of XLI cells and screening. The surface 
10 expression library is screened and encoding 
oligonucleotides characterized as described in Example I. 

KXAMPLE III 

Tsolation and Characte r ^ °f Ppptjde Uqands 

Generated from R ight-- and T^ft Half 
15 Degenerate oliq nnncleotides 

This example shows the construction and expression 
of a surface expression library of degenerate 
oligonucleotides. The encoded peptides of this example 
derive from the mixing and joining together of two 
20 separate oligonucleotide populations. Also demonstrated 
is the isolation and characterization of peptide ligands 
and their corresponding nucleotide sequence for specific 
binding proteins. 

R yyithesis 9^ Oliaonvr-;ieotide PopulatjQns 

25 A population of left half degenerate 

oligonucleotides and a population of right half 
degenerate oligonucleotides was synthesized using 
standard automated procedures as described in Example I. 

The d gen rate codon sequences for each population 
30 of oligonucleotides were generated by sequentially ' 
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synthesizing the triplet NNG/T where N is an equal 
mixture of all four nucleotides. The aritisense sequence 
for each population of oligonucleotides was synthesized ■ 
and each population contained 5' and 3' flanking, 
5 sequences complementary to the vector sequence. The 
■ compleientary tenniniwas-used to incorporate each : - 
population of oligonucleotides into their respective 
vectors by standard mutagenesis procedures. Such 
procedures have been described previously in Example I 
10 and in, the Detailed Description. Synthesis of the ■ ■ 
antisense sequence of each population was necessary since 
the single-stranded form of the vectors, are obtained only 
as the sense strand . 

The left half oligonucleotide population was 

15 synthesized having the following sequence: 5'- 

AGCTCCCGGATGCCTCAGAAGATG (A/CNN) 9GGCTTTTGCCACAGGGG-3' (SEQ 

ID NO: 52). The right half oligonucleotide population 
was synthesized having the following seqiience: 5'^ 
a^GCCTCGGATCCGCC(A/CNN)ioATG(A/C)GAAT-3' (SEQ ID NO. 53). 

20 These two oligonucleotide populations when incorporated 
into their respective vectors and joined together encode 
a 20 codon oligonucleotide having 19 degenerate positions 
and an internal predetermined codon sequence.' 

Vector t^onstruct ion 

25 Modified forms of the previously described vectors 

were used for the construction of right and left half 
sublibraries . The construction of left half siiblibraries 
was performed in an M13-based vector termed M13ED03. 
This vector is a modified form of the previously . 
30 described M13IX30 vector and contains all the essential 
features of both M13IX30 and M13iX22. M13ED03 contains, 
in addition to a wild type and a : pseudo-wild type gene- 
VIII, sequences necessary for expression and two Fok I 
sites for joining with a right half oligonucleotide 
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sublibrary. Therefore, this vector combines the 
advantages of both previous vectors in that it can be 
used for the generation and expression of surface 
expression libraries froin a single oligonucleotide 
5 population or it can be joined with a sublibrary to bring 
together right and left half oligonucleotide populations 
into a surface expression library. 

M13ED03 was constructed in two steps from M13IX30. 
The first step involved the modification of M13iX30^to 
10 remove a redundant sequence and to incorporate a sequence 
encoding the eight amino-terminal residues of human 6- 
endorphin. The leader sequence was also mutated to 
increase secretion of the product. 

During construction of M13IX04 (an intermediate 
15 vector to M13IX3 0 which is described in Example II), a 

six nucleotide sequence was duplicated in oligonucleotide 
027 (SEQ ID NO: 44) and its complement 032 (SEQ ID NO: 
49). This sequence, 5 » -TTACCG-3 • , was deleted by 
mutagenesis in the constiniction of M13ED01. The 
20 oligonucleotide used for the mutagenesis was 5'- 

GGTAAACAGTAACGGTAAGAGTGCCAG-3 • (SEQ ID NO: 54). The 
mutation in the leader sequence was generated using the 
oligonucleotide 5 » -GGGCTTTTGCCAGAGGGGT-3 • (SEQ ID NO: 
55) . This mutagenesis resulted in the A residue at 
25 position 6353 of M13IX30 being changed to a G residue. 
The resultant vector was designated M13IX32. 

To generate M13ED01, the nucleotide sequence 
encoding 6-endorphin (8 amino acid residues of B- 
endorphin plus 3 extra amino acid residues) was 
30 incorporated after the leader sequence by mutagenesis. 

The oligonucleotide used had the following segu nee: 5'- 

AGGGTCATCGCCTTCAGCTCCGGATCCCTCAGAAGTCATAAACCCCCCATAGGC 
TTTTGCCAC-3* (SEQ ID NO: 56) . This mutagenesis also 
removed some of the downstream sequences through the Spe 
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I site. 



The second step in the oonstruction of H13ED03 
involved vector, ohan^.s which put the B-endorphin 

a™ed to.Lo^o»te oii^onuoxeotide popula 
I, mutagenesis using seguences co«ple-a„tary to those 
anting or overlapping with the encoded 
10 se^ience. The ahsenoe of 6-endorphxn expression 
mutagenesis can therefore be used to measure the 
::::: nesis freguenc. In addition to the above vector 
Changes H13ED03 was also modified to contain an antoer 

codr:; position 3... for biological selection dur.ng 
15 joining Of right and left half sublibrar.es. 

The mutations were incorporated using standard^^, . 
mutagenesis procedures as described in Example I. The 
:Srrift changes and roK X site were generated us.ng . 

!n «o. 571 The amber codon was generated using the 

i:°;uri ;tide ...c^™-^^---^^^ . 

oiiy»-'" resultant vector, 

HO- 58). The full sequence of the resuii: 

■ Pimire B (SEO ID NO: 4). 

,M13ED03, is provided in Figure B l"^ 

The construction of right half oligonucleotide^ 
sublibraries was performed in a modified form of the^ 

— — rthe^:^; :r ^e^rentrr 

-r/irnnrandt^e^^^^^^^^^^^ 

nr^A This Change ensures that axx expj.» 
'° Ttir^fip-oter produces a peptide-gene VIXI fusion 

::oretnrK:m:val Of the a^er codon ^^^^^Z . 

Mutagenesis using the ^o^^-"^ ; 
6cJcAGCCTCSGATCCGCC-3. (SEQXD HC: 5„ . The- full 
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sequence of M13IX421 is shown in Figure 9 (SEQ ID NO: 5). 

T.-ibrary Construction. Scfe Anrng and characi--erization of 
Encoded Oligonucleotides 

A sublibrary was constructed for each of the ' 
5 previously described degenerate populations of 
oligonucleotides. The left half population, of 
oligonucleotides was incorporated into M13ED03 to 
■generate the sublibrary M13ED03.L and the right half 
population of oligonucleotides was incorporated into 

10 M13IX421 to generate the sublibrary Ml 3 1X4 21. R. Each of 
the oligonucleotide populations were incorporated into 
their respective vectors using site-directed mutagenesis 
as described in Example I. Briefly, the Nucleotide 
sequences flanking the degenerate codon sequences were 

15 complementary to the vector at' the site of incorporation. 
The populations of nucleotides were hybridized to single- 
stranded M13ED03 or Ml 3 1X4 21 vectors and extended with T4 
DNA polymerase to generate a double-stranded circular 
vector. Mutant templates were obtained by uridine 

20 selection in vivo , as described by Kunkel et al., supra . 
Each of the vector populations were electroporated into 
host cells and propagated as described in Example I. 

The random .joining of right and left half 
sublibraries into a single surface expression library was 

25 accomplished as described in Example I except that prior 
to digesting each vector population with Fok I they were 
first digested with an enzyme that cuts in the unwanted 
portion of each vector. Briefly, M13ED03.L was digested 
with Bgl II (cuts at 7094) and H13IX421.R was digested 

30 with Hind III (cuts at 3919) . Each of the digested 
populations wrefurther treat d with alkaline 
phosphatase to ensure that the ends would not religate 
and then digested with an excess of FoX I. Ligations, 
electroporation and propagation of the resultant library 
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s- performed as described in Example I, 



The surface expression library was screened for 
ligand binding proteins using a modified panning , - 
5 procedure. Briefly, 1 ml of the library, about 10 phage 
particles, Was :added to ; l-S^g of the lig^hd' binding ' 
protein. The ligand binding protein was either an 
antibody or. receptor globulin (Rg) molecule, Aruffoet 
. al., Cell 61: 1303-1313 (199.0) , which is incorporated 
LO herein by, reference.; .Phage were Incubated shaking with 
"affinity ligand at room temperature f or 1 to 3 hours 
followed by the addition of 200 Ml Pf latex beads 
(Biosite, San. Diego, CA) which were coated with goat- 
antimouse IgG. This mixture was incubated shaking for an 
15 additional 1-2 hours at room temperature. . Beads were 
pelleted for 2 minutes by centrifugation in a microfuge 
and washed with TBS which can contain 0.1% Tween 20. 
Three additional washes were performed where the last 
wash did not contain any Tween 20. The bound phage were 
20 then eluted with 200 Ml 0 . 1 M Glycine-HCl, pH 2.2 for 15 
minutes' and the beads were spun down by centrifugation. 
the supernatant-containing phage (eluate) was removed and 
phage exhibiting binding to the ligand binding protein 
were further enriched by one-to-two more cycles of 
25 panning. Typical yields after the first eluate were 

about 1 X 10— 5 X 10* pfu. The. second and third eluate 
generally yielded about 5 x lo' - 2 x 10^ pfu and 5 x 



10 



^ - 1 X lo'° pfu, respectively. 



The second or third eluate was plated at a suitable 
30 density for plaque identification screening and 

sequencing of positive clones (i.e., plated at- conf luency 
for rare clones and 200-500 plaques/plate if pure plaques 
were needed) . Briefly, plaques grown' for about 6 hours 
at 37-C and were overlaid with nitrocellulose filters 
that had been soaked in 2 m iPTG and then briefly dried. 
The filters remained on the plaques overnight at room 



35 



wo 92/06176 



PCT/US91/07141 



65 



temperature, removed and placed in blocking solution for 
1-2 hours. Following blocking, the filters were 
incubated in 1 ng/ml ligand binding protein in blocking 
solution for 1-2 hours nt room temperature. Goat 
5 antimouse Ig-coupled alkaline pho3phatase (Fisher) was 
added at a 1:1000 dilution and the filters were rapidly 
washed with 10 mis of TBS or block solution over a glass 
vacuum filter. Positive plaques were identified after 
alkaline phosphatase development for detection. 

10 Screening of the degenerate oligonucleotide library 

with several different ligand binding proteins resulted 
in the identification of peptide sequences which bound to 
each of the ligands. For example, screening with an 
antibody to B-endorphin resulted in the detection of 

15 about 30-40 different clones which essentially all had 
the core amino acid sequence known to interact with the 
antibody. The sequences flanking the core sequences were 
different showing that they were independently derived 
and not duplicates of the same clone. Screening with an 

20 antibody known as 57 gave similar results (i.e., a core 
consensus sequence was identified but the flanking 
sequences among the clones were different) . 

EXAMPLE IV 

.^^^^,^^ r.r. ^ Lef t H^Tf B^ndom Ol jgonucleotide Library 

25 This example shows the synthesis and construction of 

a left half random oligonucleotide library. 

A population of random oligonucleotides nine codons 
in length was synthesized as described in Example I 
except that different sequences at their 5' and- 3' ends 
30 were synthesized so that they covild be easily inserted 
into the vector by mutagenesis. Also, the mixing and 
dividing steps for generating random distributions of 



wo 92/06176 



PCT/US91/07141 



20 



66 

■ reaction products was performed by the alternative method 
of dispensing equal volumes of bead suspensions. The . 
liquid chosen that was dense enough for the beads to ' 
remain dispersed was 100% acetonitrile. 

5 ' - Briefly/ eaSh coliimri- was p^^ 

. coupling reaction by suspending 22 mg (l/^nole) of 48 
Mmol/g capacity beads (Genta, San Diego, CA) in 0.5 mis 
of 100% acetonitrile. These beads are smaller than those 
described in Example I and are derivati zed with a guanine 
10 nucleotide. They also do not have a controlled pore 
size. The bead suspension was then transferred to an 
empty reaction column . Suspensions were kept relatively 
dispersed by gently, pipetting the suspension during 

transfer. Columns, were plugged and monomer coupling 
15 reactions were performed as shown in Table XII. 

Table XII 



Sequence " 

Column f5' to 3 ') 

column IL : AA(A/C)GGCTTTTGCCACAGG 

column 2L aG(A/G)GGCTTTTGCCACAGG 

column 3L AT(A/G)GGCTTTTGCCACAGG 

column 4L AC (A/G)GGCTTTTGCCACAGG 

column 5L CA(G/T)GGCriTrGCCACAGG 

column 6L CT(G/C)GGCTTTTGCCACAGG 

column 7L AG (T/C) GGCTTTTGCCACAGG 

column 8L AT (T/C) GGCTTTTGCCACAGG 

column 9L CC(A/C) GGCTTTTGCCACAGG 

column lOL T(A/T)TGGCTTTTGCCACAGG 



25 



30 



After coupling of the last monomer, the columns were 
unplugged as described previously and their contents were 
poured into a 1.5 ml microfuge tube.' The columns ^ere 
rinsed with 100% acetonitrile to recover any remaining 
beads. The volume used for rinsing was determined so 
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that the final volume of total bead suspension was about 
100 Ml for each new reaction column that the beads would 
be aliquoted into. The mixture was vortexed gently to" 
produce a uniformly dispersed suspension and then 
divided, with constant pipetting of the mixture, into 
equal volumes. Each mixture of beads was then 
transferred to an empty reaction column. The empty tubes 
were washed with a small volume of 100% acetoriitrile and 
also transferred to their respective columns. Random 
codon positions 2 through 9 were then synthesized as 
described in Example I where the mixing and dividing 
steps were performed using a suspension in 100% 
acetonitrile. The coupling reactions for codon positions 
2 through 9 are shown in Table XIII. 







Table XIII 






Sequence 


Column 




(S' to 3M 


column 


IL 


AA(A/C)A 


column 


2L 


AG(A/G)& 


column 


3L 


AT(A/G)& 


column 


4L 


AC(A/G)& 


column 


5L 


CA(G/T)& 


column 


6L 


CT(G/C)A 


column 


7L 


AG(T/C)A 


column 


8L 


AT(T/C)& 


column 


9L 


CC{A/C)h 


column 


lOL 


T(A/T)Ta 



. After coupling of the last monomer for the ninth 
codon position, the reaction products were mixed and a 
portion was transferred to an empty reaction column, 
columns were plugged and the following monomer coupling 
reactions were performed: 5 • -CGGATGCCTCAGAAGCCCCXX&-3 ' 
(SEQ ID no: 60). The resulting population of random 
oligonucl otides was purified and incorporated by 
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.mutagenesis into the left half vector M13ED04 . 

M13ED04 is a Bodified version of the M13ED03 vector 

. described in Example III and therefore contains all the 
features of that vector. The difference between; M13ED03 
- 5 and= M13ED04 as . that. M13ED01 does not contain the. five . 

amino acid sequence (Tyr Gly Gly Phe Met) -^--^''^J^ - 
anti-B-endorphih antibody. This sequence was deleted by 
mutagenesis using., .the oligonucleotide 5'- 
CGGATGCCTCAGAAGGGCTTOTGCCACAGG (SEQ IP NO: 61) . The 
10 entire nucleotide sequence of this vector is shown m 
Figure 10 (SEQ ID NO: 6).. 

Although the invention has been described with . . 
reference to the presently preferred embodiment, it 
■should be understood that various modifications can be 
15 made without departing from the spirit of the invention. 
■ • Accordingly, the invention is limited only by .the claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Huse, William D, 

(ii) TITLE OF INVENTION: SURFACE EXPRESSION LIBRARIES OF 
RANDOMIZED PEPTIDES 

Ciii) NUMBER OF SEQUENCES: 61 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Pretty, Schroeder, Brueggemann 6 Clark 

(B) STREET: 444 South Flower Street, Suite 2000 

(C) CITY: Los Angeles 

(D) STATE: California 

(E) COUNTRY: United States 

(F) ZIP: 90071 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS/MS -DOS . . nc v 

(D) SOFTWARE: Patentin Release #1.0. Version #1.^:> 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Campbell, Cathryn A 

(B) REGISTRATION NUMBER: 31,815 

(C) REFERENCE/DOCKET NUMBER: P31 9072 

^x) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (619) 535-9001 

(B) TELEFAX: (619) 535-8949 

(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOil: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCGA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 
TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 
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CAGGGTAAAG ACCTGATTTT TGATTTAIGG TGATTCTCGT TTTCTGAACT GnTAAAGCA 
TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 
^AAGATTTTA CTATTACGCC GTGTGGGAAA AGTTCTTTTG CAAAAGCCTC TCGCTATTTT 
GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGGTCTTAC TATGCCTCGT 
^TTCCTTTT GGCGTTATGT, ATGTGGATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 
ATGAATGTTT CTACCTGTAA TAATGITGn GCGTTAGTTC GTimT^ CGTAGATTTX . . 
TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA- 
CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 
CTGGTCAGGG CAAGCGTTAT TCACTGAATG AGCAGGTTTG TTACGTTGAT TTGGGTAATG 
AATATCGGGT TCTTGTCAAG ATTACTCrTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 
TGTACACCGT TCATCTGTCC TCTTTCAAAG TIGGTCAGTT CGGTTCCCTT ATGATTGACC . 
GTCTGCGCCT CGTTCCGGCT AAGTAAGATG GAGGAGGTCG CGGATTTCGA CACAATTTAT 
CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 
■ CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 
GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC. ATGAAAAAGT CTTTAGTCCT 
CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 
CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 
TGCGTGGGCG ATGGTTGTTG TCATIGTCGG CGCAACTATC GGTATCAAGC TGTITAAGAA 
ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 
TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTIAGT IGTICCTTTC 
TATTCTCACT CCGCTGAAAG TGTTGAAAGT TGTTTAGCAA A^CCCCATAC AGAAAATTCA 
TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTr ACGCTAACTA TGAGGGTTGT 
CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 
TGGGtTGCrA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG. TGATACACCT 
ATTCCGGGCT ATACTEATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 
AACCCCGCTA ATCCTAATCC nCTCTTGAG GAGTCTCAGC CTCTTAATAC TTrCATGTTI 
CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 
CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACAGTC CTGTATCATC AAAAGCCATG 
TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTmTGAA 
GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 
GCTGGCGGCG GCTCrGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 
GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 
GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 
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GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 

- GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGAimG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

TTAATGAATA ATTtCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT " 2700 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGtGACAA AATAAACTTA 2760 

mCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTtTTG GGTATTCCGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC. 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATAT7AG CGCTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATrTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 

CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300: 

CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360-. 

CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAt 3420 

TCCTACGATG AAAATAAAAA CGGCTTGCTt gTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480> 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTtTCT ACATGCTCGT 3540: 

AAATTAGGAT GGGATATTAT CTTCCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600. 

CGTTCTGCAT TAGCTGAACA TGTTGTTtAT TGTCGTCGTC TGGACAGAAT TACTTtACCT 3660 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 

ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 

TGTGTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT AT ATAACGCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 

CAGCGTGTTA.ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 42Q0 

ATTAAAAAGG TAATTCAAAT GAAATTGTTA AATGTAATTA ATTTTGTTTT CTTGATGTTT 4260 

GTTTCATCAT CTTCTTTTGC TCAGGTAATT GAAATGAATA ATTCGCCTCT GCGCGATTTT 4320 

GTAACTTGGT ATTCAAAGCA ATCAGGCGAA TCCGTTATTG TTTCTCCCGA TGTAAAAGGT 4380 

ACtGTTACTG TATATtCATC TGACGTTAAA CCTGAAAATC TACGCAATTT CTTTATTTCT 4440 

GTTTTACGTG CTAATAATTT TGATATGGTT GGTTCAATTC CTTCCATTAT TTAGAAGTAT 4500 
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,,,,^„,-t^C.TGAA CTC.TMTCA GGMTATGAT 

ITTAMATTA ATAACGTTCO GCCAAAf^OAT TIAATA ^ ^„,CrrOIT- : 

: .CrAATAGTX GTAAATGGXC ^^rcTGX^^T^^^^ , 

.CTGACCAGA TAITGATTGA GGGIITGATA ^ ^^CTGACCGC 

nrXGATTTG CTGGTGGCTC TCAGCGXGGC ACTGTT A ^. ^ . 
CTCAGCtCTG TXTXATGT.C TGCTGGTGGT 

..C.A.CAG TXCGGGPAT. AAAGAC^T , 

- ™ — - r^G GA.GAGGG. 

.CIGGTCGrG TGACTGGtOA AIGIG^ ^^^^^^ 
^^AO GXAXXtCGAT OAGCG^^ ~ ^ ^^^^^^ 
„A GGAGCAAGGC GGA^^^ « ^^^^^^^^^^ 
.CtAATCAAA GAAGTATXGC ^^.^ OCTGTCTAAA 

.OTGCCGTGA CXGATTATAA ^^"^^^ ^.CAGGXTA 
.XGCCrnAA IGGGCCTCa GTTtAGCICC CGCTCTOm ^^^^^ 
, TAG=.GC.CG.CAAAGCAAOCAXAG,A=.«^^ 

„.r ACGGGCAGGG XGACCGCXA^ ^^G^^ ^ ^^^^^ 

CGCXIXGTrC CCXTGOXXTG XCGGCAGGXX CGCCGGGITI = _ ^oXXGA 
„X XXAGGGTXCG GAXXXAGXGC « 

, „GAX GGXXGAGGXA OXGGGCCAXC GCCGXG^^ A^^ 
. „C ACGXXG^-— ~ ~AAC CACGAXCAAA 

— rG::irrx 

OGACAGGXXXCGCGAGXGGAA^G.C.X^- 

GGACGGC^O ™- ..OOAXGXAG OAAXXGGCAG : 

TOTGAGCGGA XAACAATTIG AOACAGGAAA CA „ TGCA.TICAAI 

„AGC TGGGGGGAXC ^.^XGGX AOXAGXXAXA 

CAAGXGGXA. XGAGXAGA^ " „^^CCA 
„CrAGCAXAGGGAXTAAAXXAXXCAAAAAGrnA ^ ^ 

.^AA.XAGGGAAGAG GCGGOGACGG -OGC^^^ -^^^^ 
^GGCGAAXG GOGGTTXGCG XGGIXXGCCG CACCAGAAGG CGTG 



4560 
4620 , 
4680 
4740 . . 
:4800 ■ 
4860 
4920 
4980 
5040 ^ 
5100 
5160 
5220 
5280 
5340 
5400 
•. 5460 
5520. 
5580 
5640 
5700 
,5760 
5820 . 
5880 
5940 
6000 
6060 
6120 
6180 
6240 
6300 
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6420 
6480 
• 6540 
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AGTGCGATCT TCCTGAGGCC GATACGGTCG TCGTCCCCTC AAACTGGCAG ATGCACGGTT 
ACGATGCGCC CATCrACACC AACGTAACCT ATCGCATTAC GGTCAATCCG CCGTTTGTTC 
CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA TGTTGATGAA AGCTGGCTAC 
AGGAAGGCCA GACGCGAATT ATTT~rGATG GCGTTCCTAT TGGTTAAAAA ATGAGCTGAT 
TTAACAAAAA TTTAACGCGA ATTTTAACAA AATAriAACG TTTACAATTT AAATATTTGC 
TTATACAATC TTCCTGTTTt TGGGGCTTTT CTGATTATCA ACCGGGGTAC AIATGATTGA 
CATGCTAGTT TTACGATTAC CGTTCATCGA TTCTCTTGTT TGCTCGAGAC TCTCAGGCAA 
TGACCTGATA GCCTTTGTAG ATCTCTCAAA AATAGCTACG CTCTCCGGCA TTAATTTATC 

AGCTAGAAGG GTTGAATATC ATATTGATGG TGATTTGACT GTCTCCGGCC TTTCTCACCG 
ImTGAATCT TTACCTACAC ATTACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA 

AAATTTTTAT CCTtGCGTTG AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA 

tGTTTTTGGT ACAACCGATT TAGCTTTATG CTCTGAGGCT TTATTGCTTA ATTTTGGTAA 

tTCTTTGCCT TGCCtGTATG ATTTAtTGGA CGTT 



6600 
6660 
6720 

- 6780 
6840 
6900 
6960 
7020 

7080 

7140 

7200 

7260 

7294 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7320 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC 
ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC 
CGTTCGCAGA ATTGGGAATC AACTGtTACA TGGAATGAAA CTtCCAGACA 
GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG 
TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA 
TTGGAGTTtG. CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG 
TCTTTGGGGC TTCGTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA 
CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCtGAACT 
TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC 
AAACATTTTA CTATTACCCC CTGTGGCAAA ACTTCTTTTG CAAAAGCCTC 
GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCtTAC 

aattcctttt ggcgttatgt atctgcatta gttgaatgtg gtattcctaa 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA 
TCTTCCCAAC GTCCTGACTG GTATAATGAG CCACTTCTTA AAATCGGATA 



aaatgaaaat 

TAAATCTACT 

CCGTACTTTA 

CTCrrAAGCCA 

TCCTGACCTG 

ATATTtGAAG 

CTATAATAGT 

GTTTAAAGCA 

TATCCAGTCT 

TCGCTAllTT 

TATGCCTCGT 

ATCTCAAGTG 

CGTAGATTIT 

AGGTAATTCA 



60 
120 
180 
240 
300. 
360 
420 
480 
540 
600 
660 
720 
780 
840 
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CAATGATTAA AGTTGAAATT AAAGCATCTC AAGCCCAATT TACTACTCGT. TCTGGTGTTT 
CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTfG TTACGTTGAT TTGGGTAATG 
AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCGAGCCTAT ; GCGCCTGGTC. • 
TGTACACCGT tCATCtGTCC TCTTTCAAAG TTGGTCAGTT GGGTTCCCTT, ATGATTGACC 
GTGTGCGCCT-CGtTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CAGAATTTAT ; 
-CAGGcktGA TACAAATCTC CGTTGTACTT TGTTTCGCGC ^'^T^^^ 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTGTTTCGT TTTAGGTTGG TGCCTTGGTA. 
GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCGTC ATGAAAAAGT GTTTAGTCGT ■ 
GAAAGCGTCT GTAGCCGTTG CTACCCTCGT TCCGATGGTG TCTTTCGCTG CTGAGGGTGA 
CGATGCCGCA AAAGGGGCCT TTAACTCCCT GGAAGCCTCA GGGACCGAAT ATATCGGTTA 
TGGGTGGGCG ATGGTTGTTG TCATTGTCGG CGGAACTATC GGTATCAAGG TGTTTAAGAA 
.ATTGAGGTGG AAAGGAAGGT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT . 
TTTTTGGAGA TTTTGAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 
TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 
TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 
■ CTGTGGAATG GTAGAGGCGT TGTAGTTTGT ACTGGTGACG AAACTGAGTG TTAGGGTACA 
TGGGTtCGTA TTGGGCTTGC TATGCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT AGTAAAGGTG CTGAGTACGG TGATAGAGGT 
ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGGAGTT ATCCGGG^GG ' TAGTGAGCAA 
AACCGGGGTA ATCCTAATCC TTGTGTrGAG GAGTCTGAGC CTGTTAATAG TTTCATGTTT 
GAGAATAATA GGTTCCGAAA TAGGCAGGGG GGATTIAAGTG TTTATACGGG GAGTGTTAGT 
CAAGGCACTG ACGCGGTTAA AACTTATTAC CAGTAGACTC CTGTATCATC AAAAGCCATG 
TATGACGCrr AGTGGAAGGG TAAATTCAGA GAGTGCGCTT TCGATTCTGG GTTTAATGAA 
GAtCCATICG TTTGTGAATA TGAAGGCCAA TCGTCTGAGC TGCCTCAACG TCCTGTCAAT 
GGTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG GTCTGAGGGT 
• GGCGGTrCTG AGGGTGGCGG CTCTGAGGGA GGGGGTTGCG GTGCTGGCTG TGGTTGCGGT 
GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT \ 
GAAAACGCGC. TAGAGTCTGA CGCTAAAGGG AAACTTGATT GTGTCGCTAC TGATTACGGT 
. GCTGCTATGG ATGGTTTGAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 
GGTGATTTTG CTGGCTCTAA TTCCCAAATG, GCTCAAGTCG GTGACGGTGA TAATTCACCT 
TIAATGAATA ATITCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 
TTTGTCTnA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 
TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCxkcG 
TTTGCTAACA TACTGCGTAA TAAGGAGTGT TAATCATGCC AGTTCTTTTG GGTATTCCGT 



,900 
960 . 

1020 

1080 

1140 

.1200 . .. 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

I860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 



wo 92/06176 



PCT/US91/07141 



75 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 

CTCGTTAGCG TTGGTAAGAT TTAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT' 3300 

CTTGATTTAA GGCTTCAAAA C CTC CCG CAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 

CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT " 3480 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 

AAATTAGGAT GGGATATTAT GTTCCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGGG 3600 

CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CG ATTCTCAA TTAAGCCCTA CTGTTGAGCG TIGGCTTTAT 3780 

ACTGGTAAGA AnTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT - 3840 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA ATTAACTAAA ATATATTTGA AAAAGTTTTC TCGCGTTCTT 3960 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 

ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATtTTGTTT TCTTGATGTT 4260 

TGTTTCATCA TCTTCTTTTG . CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 4320 

TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 

TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTAGGCAATT TCTtTATTtC 4440 

TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 4500 

TAATCCAAAC AAtCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 4560 

TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 4680 

GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC 4740 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 

AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 4860 

TTTTTCATTT. GGTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 
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• COTaCCTCI-OTmATCTT.CTGCTCOTCC.TtCGITCCGI ATTTmATG CCOATOITTT 
' ^OGCmiC. dlTCCCCCAI TMACACT« TACCCATTCA AAAAUIIOT CIGTGCCACO 
TATTCHACG cmCAGGXC AGAAGGGITC lATCIGTGIT GGGCAGMTG ICGGTnlAT 
TAGTGGTCGl' GTGACTGGTG AATGIGGGAA IGIAAAIAAI CCATTICAGA GGATTGAGGG 
TGAAAAIGTA GGIATITCGA TGAGCGITTT ICGIGnGCA ATGGCtGGCG GIAATAITGI 
TGIGGATAII . ACGAGGkGG CCGAfiGITT GAGlICnGT AGtCAGGGAA, GIGATGTIAT- 
TACIAAIGAA AGAAGIAItG' GIAGAACGGX TAAITIGGGI GATGGACAGA GIGTTTTAGI : 
CGGIGGGGTG AGTGAITAIA AAAAGACTIG IGAAGATIGT GGGGIACCGI IGCTGIGIAA 

aXiggcitta aigggcgtcg tgttiaggig ggggicigai igcmcgagg AAAGCAGGH 

ATAGGIGGTC GIGAAAGGAA GGAIAGIAGG GGCCGIGTAG CGGCGCATIA AGGGGGGGGG 
GTGIGGTGGT lAGGGGCAGC GTGAGGGGTA GAGITGGGAG CGCCCIAGCG GGGGGIGGTT ' 
ICGGHTGn CGCTIGGITI GIGGCGAOGI lOGCGGGGIT IGGCCGTCAA GGTCIAAATG 
GGGGGGTGGC TTTAGGGITG GGAnTAGTG GTTTACCGGA CGICGAGCGC AAAAAAGTTC 
AITTGGGIGA IGGTIGAGGI AGTGGGCCAI CGGGCIGAXA GACGGTflTr GGCGCTTTGA 
CGTTGGAGTG CACGITGm AAIAGTGGAG IGTTGTICCA AAGIGGAAGA ACACTGAACC 
CTAIGIGGGG GTAtTCITTT GAttTATAAG GGAmTGGC GATTIGGGAA CGACGATGAA 
AGAGGAITIT GGGGTGCIGG GGCAAAGGAG GGIGGAGGGC TTGCIGGAAG IGrGIGAGGG 
GCAGGGGGTG AAGGGGAAIG AGGTGITGGG GGXaGGCIG GTGAAAAGAA AAAGCAGCGT 
CGGGGGGAAT AGGGAAAGGO GGIGTGCGGG GGGGITGGCG GATICATIAA TGGAGGTGGG 
AGGAGAGGIT IGGGGAGTGG AAAGGGGGGA GTGAGCGGAA GGCAATIAAT GTGAGITAGG 
TGAGIGATIA GGGAGGGGAG GGttlAGAGT TTATGGrTGG GGGTCGIATG TTGIGTGGAA 
nGTGAGOGG ATAACAATTr GAGAGGGGAA GGAGAGAGIG ATAATGAAAT AGOTATTGGG 
lACGGCAGGG GGIOGATTGT TATTAGIGGG TGGGGAAGGA GGCATGGGGG AGGTCGTGAI 
GAGGGAGAGT GGAGAATTGG ATCGGGAAtG AGIGTrAATT GTAGAAGGGG TAAGGTTGGG 
ACrGOGCGTG GTITIAGAAG GTCGTGACTG GGAAAAGGGI GGGGTIACGG AAGTIAATGG • 
GGTIGGAGGA CAGCGGGGTT TGGGGAGGIG GGGTAATAGG GAAGAGGGGG GCAGGGATGG 
CGGTTCGGAA GAGITGGGGA GGGTGAATGG GGAATGGGGG ITIGGCTGGT TIGGGGGACG 
AGAAGCGGtG GGGGAAAGGT GGGTGGAGIG GGATCtlGGT GAGGG6gATA GGGICGTCGI 
CGGGIGAAAC TGGGAGATGG AGGGTIACGA TGGGGCGATG TACAGGAAGG TAAGCTATGC 
CATOGGGIG AATGGGGGGT TTGITGGGAG GGAGAATGGG ACGGGnblT AaGGGTGAG 
ATIIAATGrt GATGAAAGGT GGCnAOAGGA AGGCGAGAGG GGAATIATTr ■ TIGATGGGGT 
TCGTATTGCI lAAAAAATGA GGTGATmA GAAAAAOTA AGGGGAATIT TAAGAAAATA 
TTAAGGim GAAITTAAAT ATTtGGTTAT AGAATGITGG IGmriGGG GGTmGTGA 
TTATCAAGGG GGGTAGATAI GATTCAGATG GIAGITITAG GATIAGCGTT GAICGATICT 



4980 ■ 
5040 
•5100 
.5160 
5220 ■ 
5280-^- . 
5340 ' 
5400 
■5460 
5520 
5580 
5640 • 
5700 
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5940 
6000 
6060 
6120 
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CTTGTTTGCT CCAGACTCTC AGGCAATGAC CTGATAGCCT TTGTAGATCT CTCAAAAATA 7020 

GCTACCCTCT CCGGCATTAA TTTATCAGCT AGAACGGTTG AATATCATAT TGATGGTGAT 7080 

TTGACTGTCT CCGGCCTTTC TCACCCTTTT GAATCTTTAC CTACACATTA CTCAGGCATT 7140 

GCATTTAAAA TATATGAGGG TTCTAAAAAT TTTTATCCTT GCGTTGAAAT AAAGGCTTCT 7200" 

CCCGCAAAAG TATTACAGGG TCATAATGTT TTTGGTACAA CCGATTTAGC TTTATGCTCT 7260 

GAGGCTTTAT TGCTTAATTT TGCTAATTCT TTGCCTTGCC TGTATGATTT ATTGGACGTT 7320 

(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7445 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA GTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGGCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTtAC TATGCCTCGT 660 

AATTCCTTtT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CtACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTrATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATtACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCGCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
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GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTrCCTC ATGAAAAAGT CTTIAGTCCT^ 
CAAAGCCTCT GTAGCCGTrG CTACCCTCGT TCCGATGCTC TCTTrCGCTG CTGAGGGTGA , 
. CGATCCCGCA AAAGGGGCCT Tl^CTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA..V 
TGCGTGGGCG ATGGrTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 
ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 
TTTTrGGAGA TTTTCAACGT GAAAAAATTA^mm^^ 

■TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 
■ TTTAGTAACG TCTGGAAAGA CGACAAAACT TTAGATGGTT AGGCTAACTA TGAGGGTTGT 
CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 
TGGGTTCGTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 
TCTGAGGGTG GCGGTTGTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGAIAGACCT 
ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 
AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 
CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACtG TTTATACGGG CACTGTTACT 
CAAGGGACTG ACCCCGTTAA AACTTATTAC CAGTACACTC , CTGTATCATC .AAAAGCCATG 
TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 
GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGICAAT 
GGTGGGGGCG GCTGTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 
GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGrrCGGGT 
GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 
GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT GTGTCGCTAC TGATTACGGT, 
GCTGCTATCG ATGGTTTGAT TGGTGACGTT TCCGGCCTTG GTAATGGTAA TGCriGCTACT 
GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATrCACCT 
TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA AIGTCGCCCT 
TrrGTCrrrA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AAIAAACTTA 
TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTrCTACG 
TTTGCTAACA TACTGCGTAA TAAGGAGTGT TAATCATGCC AGTrCTTTTG GGTATTCCGX 
TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACmGTT CGGCTATCTG CTIACTTTTC 
TTAAAAAGGG CTTCGGTAAG.ATAGCTATTG. CTATTICATr GTriCTTGCT CmTTATTG 
GGCTIAACTG AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 
TTGTTGAGGG TGTTCAGTTA ATTCTGCCGT CTAATGCGCT TCGCTGnTT TAIGTTAITC 
TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTCAAACA AAAAATCGTr TCTTATTTGG 
ATTGGGATAA ATAAl^TGGC TGTnATm GTAACTGGCA AATrAGGCTC TGGAAAGAGG 
CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGGTG GGTGCAAAAT AGCAACTAAT 
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CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 
CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT . . 3420 
TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 
ACCCGTTCTT GGAATGATAA GCAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 
AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 
CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 
TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCtCTGCC TAAATTACAT 3720 
GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 
ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 
AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 
TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 
GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 
CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 
AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 
ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 
TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 
TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 
TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 
TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 
TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 
TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 
TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 
GTCTAATACT TCTAAATCCT CAAATGTAtT ATCTATTGAC GGCTCTAATC TATTAGTTGT 
TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 
AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 
TTTrrCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 
CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 
AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCGACG 
TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 
TACTGGtCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG - 5160 
TCAAAATGTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 
TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 
TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 5340 
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CCOTCCCCTC ^^O^CTTO TC^CmCT caCOT.CCCT TCCXOT^^ 

MCCCTXT. .TCOCOCTCC TCTXTAOCTC CCCOtCXC.T TCCMCO.CO «.OCACm^ ■ 
.X^CCTCCTC CTCM^CC^ CO.XACT.CO COCCO.OT.C CCCCCC.m ACCOO. 
.,.C.C™c=T XACOC=C.C.CTO*=C.CTA CCmCCO caCCCT.OCC CCCOCTOCXT 
ICCCrrXCXX =CCTTCCTTT CTCCCCACOT TCGCCOOm XCCCCOtCM OCXCTAMTO 5640 
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lUOOJ. J.l'-'ii — -T- 

CaCCOCICCC mAGGCTTC; COiTTTAeiC CTTTACC.CA CCTCOACCCC MAAMCITG 
.TTTGGGTOA IGOilGACGI AGIGGGCCAI CGOCCIGAIA GACGOniTT CGGOCIITGA 5760 
<:<^OC=AGTC eACGTTCTTr AATAGTSCAC TCTtGnCCA AACtCGAACA ACACTCAACC 
CIATCIOGCG CTAItcmr GATTTATAAG GGATITTGCC GAITTCQGAA CCACGAIGAA 
^CACGATTTT CCCTGGTGG CGCAAACCAO GGTGGACCGC TTGCTCCAAC TCTCTCAGOG 
CCAGGCGGTG AAGCGGAATC AGCTGTTGCC CCTCICGCTG CTGAAAAGAA AAACGAGCGI 
COCGOCCAAT ACGCAAACCG CCTCTCCCCa C=c6tTC=CC GArTCATTAA TOGAGCTGGO 
. .CGACAGGTT tCCCGACTGG AAAGCGOOGA GTGAGCGCAA OGCAATTAAT OTGAGtXAGC 
XCACTCATTA CGCACCCCAO .CTTTACACT TTATCCTXCC GGCTCCTATC TTGTGTCOAA 
XXGIGAGCGG AIAACAATTT CACACGCGXC ACXXGGCACX GGCCGIGGH TXACAAGGIC S 4 
OXCACXGGGA AAACCCTGCC OXXACCCAAG OXXTCTACAX OGAGAAAAXA AAGXGAAACA 30 
^OCACTATX GOACXCGCAC XCXXACCGXT ACOCXXACXG mACCCCXC XGACAAAAOC 
CCCCCACGXC CACCXGGTCC AOXCAC=GCaX AXXGXCCCCA COG<=A«GXA CTAGXG.AXC 4 
CTAGGCTGAA GGCGAXGAGC CXGCXAAGCC TGCAXXCAAX ACrtXACAGG CAAGTGGXAC ^ 
XGACXAGAXX GGCXACCGTX GGGCTAXGGX AGXACXTAXA GXXGCXGGXA CGAXAGGOAX « 

XAAAXTATXC AAAAAGXXTA CGAGGAAGGC XXGXTAAOCA AXAGCOAAGA GGCCCGGACG ; 

O.XCGCCGTX GCCAACAGXX GCGCAGGCXG AATGGGGAAX MCGGXXXGC GXGGXTXCCG ■ .666, 

OCACGAGAAG CCGTGCGGGA AAGCXGCGXG CAGXGCGAXC XTCCXGA«=C GCAXACGGTC ^ 

CXCGTGCCCT CAAACXGGCA GAXGCAGGGT XACGATGCGC 'OdAXCTAGAC CAAGGXAACC ■ 67 

XATCCCAXIA CGGXCAAXGC GOCOXTTCnX COCACGGAGA AXCCGACGCC XTCTTACTCG 66^ 
CXCACAXXTAAXGTIGAXGAAAGCXaGGIACAGGAAGGCCAGACGCGAAXXAXTrTTGAX 6^ 

OOCGTTCCXA IXGSTXAAAA AAXGAGCIGA XTXAACAAAA AXTIAAGGCG AATIXTAAGA. ,69, 

MATATTAACGmACAAXT XAAATAXTIG CXTATAGAAX mCGXGXXX. TICGGGCTTX- .70 

XCXGATIAXG AACCGGGCIA CAXAXGAXXG ACAXGCXAGX XIXACGAXTA CGGTrGATCG 7 

;,TXCtCXXGX TIGCTCCAGA CICXCAGGCA AXGACCXGAI AGCCXTTGXA GATGTCXCAA . 71 

«^XACGTAC CGTCIGGGGC AXXAAXTIAX GAGCXAGAAC GGTXGAATAX GATATIGATG.. T 

GXGAXTIGAC IGICXCGGGO CTTICICACC CmiGAAIC nXAGCXAGA GATIACTCAG 7 
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GCTCTGAGGC TTTATTGCTT AATTTTGCTA ATTCTTTGCC TTGCCTGTAT GATTTATTGG 7440 

ACGTT 7445 

(2) INFORMATION FOR SEQ; ID N0:4: 

(i). SEQUE!:CE CHARACTERISTICS: 

(A) LENGTH: 7409 base pairs - 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY; circular 

(xi)^SEQUENCE DESCRIPTION: SEQ ID N0:4: 

.AATGCTACTA CTATTAGTAG" AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 6G 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 3Q0 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTI 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTtGTCAAG ATTACTCTTG.^ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCGCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTTGTACTT TGTnCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG-TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440. 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA . 1500 
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• ■ " ..r.ArrTGATAAACCGATAGAATI/^GGCTCCTTTT.GGAGCCmT 

ATTCACCTCG AAAGCAAGCT GATAAAi.i-^'^ ^ ■ 

^c^CGT GAAAAAAm TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 
TTTTTGGAGA TTTTCAACGT GAAAAAAii ^,^^.»4TTCA 

CCCOXCM.C XCXXCMAOt .CTTT.CC^ -CCCC.T.C 
XTTACTAACC ICTOGAMGA CGACMMCT TTAOATCOTT ACdOTMCTA TOACGCrrGT 
L aACAOCCCX TOTAC^X ACTCOX.ACC . AAAC.CAOTC TTACCCXACA 
""l^^GCCXXGCXAXCCCrCMAAXOA..^^^^ 

CXOACGOXC .CC~.CCOXC.CCCX ACXAAACCXC CXCAOXACOC TCAXACACCT 
™ CCCCX AXACXXAXAX CAACCCXCXC CACCCCACXT AXCCCCCXCC XACXCAC^ 

^CCCCCCXA AXCCXAAXCC XXCXCXXCAC CACXCXCAOC CXCXXAAXAC TXXCAXCX^ 

^.nrr^rrcc GCAXXAACIO IXTAXACGGG CACTGIXACX 

CAOAAIAAXA GCTTCCGAAA lACGCAGCCC GCAXXAACI ^ ....p^.^G 

GAAGGCACXG ACCGCGXXAA AACXXAXXAC CACTACACXC ™- 
XATOACCCXX AOXGGAACGC xkAATTCAGA GACTGCGCTT XGCAXXCXCG CXXXAAXGAA 
CAXCCAXXCG xrXCXGAAXA XCAACGCCAA XCGXCXGAGC XGCCXCAACC XCCXGXGAA 
OCXCGCGCCO CCTCTGGXGO XGOXXCXGGX GGCCGCXCXG ACGGTGGXCG CXCXGA G^ 
CGCGCXXCTG ACGCXCGCCC CXCXGACCGA GGCCCXXCCO CXC^CCXC TGCXT GGT 
GAXmGATr ATGAAAAGAX GCCAAACGCX AAXAACGGGC CXAXGACCCA AAAXGGCGAX 
CAAAACCCCC XACAGXCXCA CGCXAAACGC AAACXXGATT CXGTCGCXAC TGATTAGGGX 
ircXAXCC AXCGXXXCAX XCGXGACCXX XCCCGCCXXG CXAAXGCXAA XGGXGC^^ 
CGTGAXXXXG CXGGCXCXAA XXCCCAAAXC GCXCAAGXCG GXGACGCXCA XA^n ^ T 
XXAAXCAAXA AXTXCCGXCA AXATXTACCT XCCCXCCCXC AAXCGGX^A AXOXCG 
XTTCXCrnA GCGCXGGXAA ACCAXAXCAA .TXXtCXAXXG AXXGXGACAA AATAAACXXA 
XXCCCXCGTC XrrrXGCGXT XCXXTXAXAX GrrCCCACCX ttaxgxaxcx atxtxcxacg 
™CA XAC^CGXAA XAAGGAOXCX XAAXGAXOCC AGXXCXXXXG G^ , 
XATTArrGCG TXrCOXCCGX rtCCXXCXGG TAACmCXX CCGOXAXGXG CTTACTTXTC 
X^IaAGCG CnCGGXAAG ATACCXAXXG CTAXnCAXT GXXTCXTCCX GrTATXAT^ 
^^=XG AAXXCXTG^ GGXXAXCXCX CTGAXAXXAG CCCXCAATXA CCCTCXOAa 

™gc tgxxcagxxa axxcxcccgx cxaaxgcgcx xcc™ TAXGXT^: 

TcXCrGXAAA GGaCCXAXX XTCAX^C ACCXXAAACA AAAAATCGXX XCTT^^ 

<-TAArTrrrA AATTAGGCTC TGGAAAGAGG 
A-rrrcGATAA ATAATATGGC TGTTTATrTT GTAACTGGCA AATIA*. 

~ XXGOXAAGAX XCACGAXAAA AXXCXAGCXO CGXGGAAAAX AOCAACX^ 
™- GGCTXGAAAA CCXCCCGCAA dxCGGGAGCX XCGGXAAAAC GGGXCG^ 
™XAC CGGAXAAGCC rTCXAXAXGX GAXTTGCXXG CXAXXGGGGG, GGGTAAXGAT .. 
™rGAXG AAAAXAAAAA CGCCXXGCXX GXXGXCGAXG AGXGCGGXAC XTGOXTTAAX 
A^^CnCXX GOAAXCAXAA GCAAAGACAG CCGAXXAXTC AXXCGXXXCX ACAXGCXGGX. 
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AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 

CGTTCTGCAt TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT * 3660 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 

ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC rCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 

ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 

TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 4320 

TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 

TACTGTtACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 4440 

TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCMTT CCTTCCATAA_ TTCAGAAGTA 4500 

TAATCCAAAC AATCAGGATT ATATTGATGA MTGCCATCA TCTGATAATC AGGAATATGA • 4560 

TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 4680 

GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTdT 4740 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 

AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 4860 

TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 

CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 4980 

AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 

TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 5100 

TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 5160 

TCAAAATGTA GGTATTTCGA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 5220 

TCTGGATATT ACCAGCAAGG CCGATAGTrT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 

TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGeGT GATGGACAGA CTCTTTTACT 5340 " 

CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 540O 

AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5460 

ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGGGCATTA AGCGCGGCGG 5520 

GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCtAGCG CCCGCTCCTT 5580 



PCT/US91/07U1 

WO92/06176 



84 

5640 



.cocmcT cccTTOCTtr CTCCCC.COT TCCCC.CCTX ^«-<=^- ™^ 
.ccoocTCCc ™rxc ™ox= cxmcooc. cctcoacccc mam^ot^o 

X~CA XCOTTC.CCT .COCCCT.AT* aACCOTTTTT COCCCTTTOA =7.0 



.5820 



CCTIGSAGTC CACGtrCrrr AAtAOrOOAC TCTTOTICCA AACIGGAACA ACACICAACC. 

~o cta™ oa^axaac ocArmccc oaAccoaa ccaccaxcaa. .3S 

^CCTCCTCS CGCAAAOCAG C^^^ 
" AAOOOCAAXC AOCXOXXCCC COXCXCOCXO : CXGAAAACAA AAA.CACCCX 
CCAAX ACOCAAACCG CCXCXCCCCO CCCOXXGCCC CAXTCAXXAA XGCAGCXGCC 
.CCACACOIX XCCCOACXCG AAACCCOCCA OTOAOCCCAA COCAAXXAAX CXCAOXXAGC 6120 
XCACXCAXXA GGCACCCCAO COXTTACACX XXAXGCIXCC CGCXCOXAXC ■ TXGXGXGOAA 
■ XXGXGAGGGG AXAAGAAXXX CAGACGCGXC ACXXGGGACI GGGCffTGGXX XXACAACOXC 
SXGACXGGGA AAACCCXGOC GXXACCCAAG CXtlGXACAX GGAGAAAAXA AACXGAAACA 
MCCACXATX GCAGXGGCAC XCXXACCGXX ACXCXTXACC CCXGXGGCAA AAGGCXAXCG 
GGGGXXXAXG AGXXGXGAGG GAXGCGGAGO XGAAGGGGAX GACGCTGCXA AGGCXGGAXX 
C^XAGXXXA CAGGCAAGXG CIACXCAGXA CAXXGGCXAC GCXXGGGOXA XGGXAGTAGX 
XAXAGXXGCn GGXAGCAXAO OGA^AAAXX AnCAAAAAC XTTACGAGCA AGCCXTGXIA 
.CGAAXAGGG AAGAGGCCGG CACCGAXCGC CCXXGGGAAC AGXXGGGCAG CCXGAAXGGO 
r^GGCX XXGCCXGGXX XOCGGCACCA-GAAGCG<.GC CGGAAAGCXG GGXGGAOXGO 
^.XCXTCCXG AGCCCGAXAC G^COXCGXC CCCXCAAACX GGCAGAXCCA GG<^ACGA 
"rGAXCr ACACCAACGX AACCXAXCCG AXTAOGGXCA AXGGGCGCTX XGXXG^^ 
O^OAAXCdOA GGGGtXOXXA CXCGCTCACA XXXAAX,^G AX«AAAGCTG GCIA»G^ 
CGCCAGAGGC GAAXXAXm XCAXGGCGXX CCXAXXGGXX MAAAAXGAG CTGAXTXAAG 
^XTIAA GGGGAATX^ AAGAAAAIAT XAAGGXXXAG AATIXAAAXA TrXGCXXAXA 
^CXXCCr GXXTXXGGGG CXXXTCXGAX XATCAAGGGG GGXACAXAXG AXXGACAXGG, 

™agg axxaggcxxg axcgaxxcic xxgxxxccxc gagagxgxga gogaatgacc 

xTxagaxgxc xcaaaaaxao oagcgxcxg cgogaxxaax xtaxgaggxa m 

■ AXAXGAXAXX GAXGGXOAXX XOACXGXCXC GGGCC^GX CACCCrmG 7300 

M^LaCG XAGACAXXAC:XGAGGGAXXG GAXrXAAAAX AXAXCAGGGX XCTAAAAATT 
^XCCTXG GGTXGAAAXA AAGGCXXCXG CCGGAAAAGX AXXAGAGG^ GAXAAXGm 
SLxAGAAG CGAXXXAGCrr XXAXGCrCXG AGGCXTXAXT GCTXAATTTT GPXAATXCXX 
IGCCXIGCCX GIAXOATnA XIGGACOXI 
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(2) INFORMATION FOR SEQ ID n6:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7296 bas pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCCA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA .TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TtACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC ' 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 



8-6' . - 

■ rr..cc TCIGCAMOA CGACAWACT mCAtCGTl' ACGCIAACTA TGAGGCTTGT ■ 11 
TTTACTAACC ICIGCAAAGA ^ „ACG=TACA ■ , V 

CTGTGGAAIG GTACAGGCGT TGIAGTTTGI _„„o^oA GGGTGGCGGI 1 

„A,.GGGGnGGXA.CGC.GAAAA^- ~ 
TCTOAGCGIG GCOGrrGT^A OGGTGGGGGX ACXAAAC TO . , 

ATTOGGGGCI ATAGTTATAt GAACCGICTG GACGGCACIT , AIGCGGGTGG T ^ 
■EcGGGXA ATCGXAATGG -GXmGAG-GAGTGXCAGC CXGXXAAXAG 

„A .XXGCGAAA -GAGGGG GCAXXAACXC . 

OAAGGOACXG ACCCCGXXAA MCXXAIXAC GAGXAGACXC ""^^^ , 

XAXGAGGOTX AGXGGAACGG XAAAXXGAGA GAGXGGGGXT XGGAXXCX ^ 
gItCCATXGG XXXGXGAAXA XGAAGGGGAA XCGXGXGAGG XCGOXGAACG X«^^^^^ 

Sgcgggg ggxgxggxgg xggxtcxggx ggggggxcxg agggxggxg 

GCGGXXCXGAGGGXGGGGGGXGXGACGGAGGCGGXXCCGGXGGXGGG^ — 
.^GAXX AXGAAAAGAX GGGA.CGCX A«AAGG=GG GXAXGAG A ^XGGG. ^ 

; CXGGGXGXAA XXCGCAAAXG GGXGAAG.G G„ 

„AA™CAAXAXnAG=^X^CX^^ ~ 
ftlGXCrrXA GGGCXGGXAA AGGAXAXGAA TXXXCXATXG _ - 

XTCCGXGGXG XGXXXGGGXX XGrrrXAXAX GXXGGGACCX^ ^TxXC.GX 
rrXGGTAAGA XAGXGGGXAA XAAGGAGXCX XAAXCAXGGG AGTICIXXXG _ . 
™» XXXGGXCGGX XXCGXXGXGG XAA^ ; 
^MAAAGGG GXXGGCXAAG AXACCXAXXG CXAXXTCA^ ^« 
GGCITAACTO AAXXOXXC^G GGXXAXCTGX GXGAXATrAG GGCXCMm 
„^GAGGGXGXXGAG.AA«=X^.~^^— 

TCTCTGTAAA GGGTGCTAli 11 » »-rT*rrrTC TGGAAAGACG 

AXAAXAXGGG XGXXX^ — -~ 

^ GTTGATrTAA GGCTTCAAAA t,^i rT^TTGGGCG CGGTAATGAT. 

CXXAGAAXAG CGGAXAAGGG XXCXAXAXCX GArTXCCTTO C« 
XCCXACGATO AAA^TAAAAA CGGCTTG^ OXXCTOGATG — ^ 

.CCCGXXGXX GGAAXGAXAA ■C.AAAGAOAG CGGAX^; -™ ^^^^^^ 
-^.-rn.AT' rTTrrTTGTT CAGGACTTAT CTATTbii^^ 

GGTXCXGGAX XAGCIGAAGA TGXXGr^X ^^^^ 
TTIOTCGCTA CXTIAXArrC TCIXAXXACX GGCXCGAAAA TG 
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GTTGGCGTTG TTAAATATGG GGATTCTCAA TTAAGCCCTA CTGTTGAGGG TTGGCTTTAT 
ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 
TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 
AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 
TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 
GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 
CAGCGTCTTA ATCTAAGCTA tCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 
AGCGACGATT TACAGAAGCA AGGrTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 
ATTAAAAAGG TAATTCAAAT GAAATTGTTA AATGTAATTA ATTTTGTTTT CTTGATGTTT 
Gm-CAtCAT CTTCTTTTGC TCAGGTAATT GAAATGAATA ATTCGCCTCT GCGGGATTTT 
GTAACTTGGT ATTCAAAGCA ATCAGGCGAA TCCGTTATTG TTTCTCCCGA TGTAAAAGGT 
ACTGTTAGTG TATATTCATC TGACGTTAAA CCTGAAAATC TACGCAATTT CTTTATTTCT 
GTTTTACGTG CTAATAATTT TGATATGGTT GGTTCAATTC CTTCCATTAT TTAGAAGTAT 
AATCCAAACA ATCAGGATTA TATTGATGAA TTGCCATCAT CTGATAATCA GGAATATGAT 
GATAATTCCG CTCCTTCTGG TGGTTTCTTT GTTCCGCAAA ATGATAATGT TACTCAAACT 
TTTAAAATtA ATAACGTTCG GGCAAAGGAT TTAATACGAG TTGTCGAATT GTTTGTAAAG 
TCTAATACTT CTAAATCCTC AAATGTATTA TCTATTGACG GCTCTAATCT ATTAGTTGTT 
AGTGCACCTA AAGATATTTT AGATAACCTT CCTCAATTCC TTTCTACTGT TGATTTGCCA 
ACTGACCAGA TATTGATTGA GGGTTTGATA tTTGAGGTTC AGCAAGGTGA TGCTTTAGAT 
TTTTCATTTG CTGCTGGCTC TCAGCGTGGC ACTGTTGCAG GCGGTGTTAA TACTGACCGC 
CtCACCTCTG TTTTATCtTC TGCTGGTGGT TCGTTCGGTA TTTTTAATGG CGATGTTTTA 
GGGCTATCAG TTCGCGCATT AAAGACTAAT AGGCATTCAA AAATATTGTG TGTGCCACGT 
ATTCTTACGC TTtCAGGTCA GAAGGGTTCT ATCTCTGTTG (3CCAGAATGT CCCTTTTATT 
ACTGGTCGTG TGACTGGtGA AtCTGCCAAT GTAAAtAATC CATTTCAGAC GATTGAGCGT 
CAAAATGTAG GTATTTCCAT GAGCGTTTTT CCTGTtGCAA TGGCTGGCGG TAATATTGTT 
CTGGATATTA CCAGCAAGGC CGATAGTTTG AGTTCTTCTA CTCAGGCAAG TGATGTTATT 
ACTAATCAAA GAAGTATTGC TACAACGGTT AATTTGCGTG' ATGGACAGAC TCTTTTACTC 
GGTGGCCTCA CTGATTATAA AAACACTTCT CAAGATTCTG GCGTACCGTT CCTGTCTAAA 
ATCCCTTTAA TCGGCCTCCT GTTTAGCTCC CGCTCTGATT CCAACGAGGA AAGCACGTTA 
TACGTGCTCG TCAAAGCAAC CATAGTACGC GCCCTGTAGC GGCGCATTAA GCGCGGGGGG 
TGTGGTGGTT ACGCGCAGCG TGACCGCTAC ACTTGCCAGC GCCCTAGCGC CCGCTCGTTT 
CGCTTTCTTC CCTTCCTTTC TCGCCACGTT CGCCGGCTTT CCCCGTCAAG CTCTAAATCG 
GGGGCTCCCT TTAGGGTTCC GATTTAGTGC TTTAGGGCAC CTCGACCCCA AAAAACTTGA 
TTTGGGTGAT GGTTCACGTA GTGGGCCATC GCCCTGATAG ACGGTTTTTC GCCCTTTGAC 
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3840 
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GrrGGAGTCC ACGrrCTTTA ATAGTGGACT GTTGTTGGAA ACTGGAACAA CACTCAACCC 
TATGTCGGGC TATTCTTTTG ATTTATAAGG GATTTTGCCG ATTTCGGAAC CAGCATCAAA 5880 
GAGGATTTTC GCCTGCTGGG GGAAACGAGC GTGGACCGCT TGCTGCAACT CTCTCAGGGC 
CAGGGGGTGA AGGGCAATCA GCTGTTGCCC GTCTCGCTGG TGAAAAGAAA AACCACCCTG 
GGGCGGAATA GGGAAACCGC CTGTGCCGGC . GGGTTGGCGG ATTCATI^T GGAGCTGGCA 
CGACAGGm CCCGAGTGGA TGAGeGGAAG- GCAATTAATGlTGAGTTAGCT. . , 6,120 

CACTCATTAG GCACCCCAGG CTTTACACTT TATGCTTCCG GCTGGTATGT TGTGTGGAAT 
TGTGAGCGGA TAACAAITTC ACACAGGAAA CAGCTATGAC CAGGATGTAC GAATTCGCAG 
GTAGGAGAGC TCGGCGGATC CGAGGCTGAA GGCGATGACG CTGCTAAGGC TGGATTCAAT 
AGTTTACAGG CAAGTGCTAG TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA 
GrrGGTGCTA GCATAGGGAT TAAATTATTC AAAAAGTTTA CGAGCAAGGC TTCTTAACCA 
GCTGGGGtAA TAGCGAAGAG GCCCGGACCG ATCGCCCTTC CCAACAGTTG CGCAGCCTGA 
ATGGGGAATG GCGCTTTGCC TGGTTrGCGG. CACCAGAAGC GGTGCCGGAA AGCTGGCTGG 
AGTGCGATCT TGGTGAGGCC GATACGGTCG TCGTCCCCTC AAACTGGCAG ATGCACGGTT 
ACGATGCGCG CATCTACACC AAGGTAACCT ATCCCATTAC GGTGAATCCG CCGTTTGTTC 
CCACGGAGAA TGCGACGGGT TGTTACTCGC TC^CATTTAA TGTTGATGAA AGGTGGCTAC 
AGGAAGGCCA GACGCGAATT ATITTTGATG^ GCGTTCCTAT TGGTTAAAAA ATGAGCTGAT 
TTAAGAAAAA TTTAACGCGA ATTTIAACAA AATATTAACG TTTACAATTT AAATATTTGC 
TTATACAATC TTCCTGTTTr IGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGATTGA 
CATGCTAGrr TTACGATTAC CGTTCATCGA TTCTCTTGTT TGCnCCAGAC TCTCAGGCAA 
TGACCTGATA GCCTTTGTAG ATCTCTGAAA AATAGGTACC CTCTCCGGCA TTAATTTATC 
AGCTAGAACG GTTGAATATC ATATTGATGG TGATITGACT GTCTCCGGCC TTTCTCACCC 
TTTTGAATCT TTACCTACAC ATTACTCAGG CArTGCATIT ' AAAATATATG AGGGTTCTAA 
MATTTTTAT CGTTGCGTIG AAATAAAGGC TTCTCCCGCA AAACrTArCAC AGGGTCATAA 
TGTTTrrGGT ACAACCGATT TAGCTTTATG CTGTGAGGCT TTATTGCTTA ATTTTGCTAA 
TTCTTTGCCT TGCCTGTATG ATTTATTGGA CGTT 

(2) INFORMATION TOR SEQ ID HO: 6: 

SEQUENCE CHARACTERISTICS:, 
^ (A) LENGTH: 7394 base pairs 
(B) TYPE: nucleic acid 
(G) STRANDEDNESS: both 
(D) TOPOLOGY: circular 



5940 
6000 
6060 



6180 
6240 
6300 ■ 
6360 
6420 
6480 
6540 
6600 
6660 
6720 
6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
7260 
■ 7294 



(xl) SEQUENCE DESCRIPTION : SEQ ID NO : 6 : 

ICTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC 
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ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

iTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCoAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT ' 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GG' irrrr ATC GTCGTCTGGT AAACGAGGGT .TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CtACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATtAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCG TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

GAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA' 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCtCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTrGGAGA TTttCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AAGCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 

CTGTG6AATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

fCTGAGGGTG GCGGfTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT* 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTtT 2040. 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 



90 

VrrcmM AACmnAC C^GTACACTC CTGTATC^TC AWVAGCOAIC 2; 

CAACOCa™ AC cr^^ M^ ..CTCCC.XX XCCATTCXCC CTTTAATOAA . 

TATGACGCn ACTGGAACGG TAAAiii-A ^^..^r rrrTrXCAAT 2 

====== : 

rrrCGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTC, 
S^S^AXOAAAAOAXCCC^^^^^^ 

O^^Z XAOACXCXOA CCCXAAAGCC AAA^ —XAC XCATXACGOX 
^ Ix mcAX XCCXOACaXX XCCOOCCXXO CXAAXGCXAA XOOXCOXACT 2 

— "CXAA XXCCCAAAX. — — ^ ^ ; 

— : ^ - 

TTTGTCTTTA GCGCTGGTAA ACC ^^..^^-.t^TATGTATGT ATTTTCTACG . • 2 

TTCCGTGGTG TCTTTGCGTr TCTTTTATAT GTTGCCACCT TTATGTAl 

TTCCGTGGTG . , ^AATCATGCC AGTTCTmG GGTATTCGGT ^ ^ 

rriGGTAAGA TACTGGGTAA ^^^^^^^^^'"^'^^^^ CTIACTTTTG 2 

TATTAriGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT GGGCTATCT . ^ _ . 

:La.= cxxco.aaoaxaccxaxxc cxaxxx..c^^^^^^ , ■ 

, OOCX^AACTC AAXT<^=TO OCXXAXCXCT CXCAXa™ AA^ ^^^^ 

. ^oTXOAOOo xcxTCAom Axxcrccc^ cxAAXCcccx x^cxm 

TCTCIGIAAA CGCTGCXATT TrCATXmO ACGTTAAACA AAAAAXCGTI XGTTA ^ 
AXAAXAXGOC XG^A^ -GXGGCA AA^ACGCXG XGGAAA^ 
rcCTXACCG XXGGXAACAX XTAGGAXAAA AXrOXACGXC GGXGCAMA .C^^ . . ; 

™a CGCXTCAAAA COXGCCGCAA GXCGGCAGGI XCGOtAAAAC GCOXCGCGXT . 
CTIGATXXAA GGCIXCAAAA 1.1. cIAXXGGGCG CGCTAATGAT 

CXIAGAATAG GGGAXAAGCG XXGXAXAXCX GAXXXGGXXG 

XGGXAGGAXG AAAAXAAAAA CGGGXXGGrT GXXGXGGAXG A^G^*^ ^^ ^^ 
.CCGGXXGTT GGAAXGAXAA GGAAAGAGAG GGGAXXATTG ATT^- 

= = === 

™^ crmxAric xcrr^^ .-,„,„.„ggg xxgggxxxax 

GTIGGCGIiq XXAAAXAXGG CGAXXCXGAA XXAA TAAXXATGAX . 

— ~ ™= — X CAAAGCAXXA. 

TCGGGTGTrr ATTCmrrr AACGCCTTAT TTAi ^rrCGTTCTT 
^LaGGXG AGAAGAXGAA GGXXAGXAAA AXAXAXXXGA AA.^^^ 

.CXGXXGGGA TXGGAXXXGG AXGAGGAXXX ACAXAXA^ AX^^^A 

....... AGGXAGXCXC XGAGACGXAX GAXXXXGAXA AATICACXAX TGA 

OAGGXXAAAA AGGXAGXGX^ ^^^^ ^^^^ 

CAGGGTGTIA AXOXAAGCXA ICGCXAIGll 1 
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AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 

ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 

TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 4320 
TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GTITCTCCCG ATGTAAAAGG . 4380 

TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGA/AAT CTACGCAATT TCTTTATTTC ,4440 

TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 4500 

TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 4560 

TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

TTTTAAAArr AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTrGTAAA 4680 

GTCTAATACT .TCTAAATCCT CAAATGtATT ATCTATTGAC GGCTCTAATC tATTAGTTGT 4740 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 

AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 4860 

TTTTTGATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 

CCTCACCTCT GTmATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 4980 

AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 

TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTtTTAT . 5100 

TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGGG 5160 

TCAAAATGTA GGTATTTCCA TGAGCCTTTT TCCTGTTGCA ATGGCTCGCG GTAATATTGT 5220 

TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 

TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 5340 

CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 

AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5460 

ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 

GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 5580 

TCGGTTTCTT CCCTTCCTTt CTCGCCACGT TCGCCGGCTT TCGCCGTCAA GCTCtAAATC 5640 

GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CrTTACGGCA CCTCGACCCC AAAAAACTtG 5700 

ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 5760 

CGtTGGAGTC CACGTTCTTT AATAGTGGAC tCTTGTrCCA AACTGGAACA ACACTCAACC 5820 

CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 5880 

ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAGGG . 5940 
GCAGGCGGTG AAGGGCAATC AGCTGTTGCG CGTCTCGCTG GTGAAAAGAA AAACCACCCT . 6000 . 

GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGGC GATTCATTAA TGCAGCTGGC 6060 

ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 6120 

TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6180 
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..rr ArTTrGCACT GGCCGTCGTT TTACAACGTC 6240 

„=0. M^CCCTCOC =™==0M. C^=^^ , ,3,0 

^.O.C.AT. Tcn.c,^ -^^^^^ 

. CACGCAICOG GCACOIGMG GCCATGAOCO KCIMO ^ „,„,CTAC 

- - -i::": :~ 

CATAGGGATI AAAItATIOA AAAAGXTTAC GCGCTTIGCC 
.CCCGGACCG ATCGCGCXXC -^^^^'^^l..^^^ rC^^CC 
— CAOCAGAAGC GG.G.G « "^f^ „G CA.C.ACACC 
.ATACGGTCG TCGTCCCCXC AAAGTGGCAG A ^^^^^^^^ 
^CGTAAGGX ATCGGATXAC CGXCAAT - G^m ^ ^^^^^ ^^^^^^^ 
.CTTACTGGO .CAGAXrTAA TGXTGAXGAA „„,.CGA 
,™T0 GCGT.GG.A. XGGTTAAAA^ '^^^^^ 
,^CAA AATATIAACG THAGAAITT AAATAmOC 

^GGGGXm CTGATXATCA ACCGGGGTAC ATAtGATTGA AT^--^ ^ ^^^^ 
CGX.CA.CGA ™- .GCXCCAGAC XCXCAGGOAA ^^^^^^ 
^CTGTCAAA AATAGCXACC CTCTCCGGCA TXAATTXATC A^A^ ^^^^ 
.XA— TGATXXCACT GTGXCCGGCC XXXCXC^^ 

^CA^GCAXXX AAAAXAX^G.™-^^ . 

MATAAACGC XXGXGCCGGA AAAGXAXXAC ^ „ „,CXGIAXG 7380 

. ^GCirlAXG GXCXGACGCX XXAXXGCTXA AXTTXCOXAA XXGXXX ^^^^ . 

ATTTATrGGA CGTT 

(2) IHFOmTION ^^^^"^ 

SEQUENCE CHARACTERISTICS : . .. ; . ^ 

(A) 1£NGTH: 37 base pairs 

m TYPE: nucleic acid - 
(c\ STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(xi) SEQUENCE description: SEQ ID K0:7: 37 
OATCCTAGGC TGAAGGCGAT GACCCTGCTA AGGCTGC 

(2) INFORMATION TOR, SEQ ID H0:8:. ; ■ ,; 

SEQUENCE CHARACTERISTICS: ^ 
(Sl£N6TH: 35 base pairs 
/BV TYPE: nucleic acid 
rev STRANDEDNESS: single 
(D) TOPOLOGY: linear 



wo 92/06176 



PCT/US91/07141 



93 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 
ATTCAATAGT TTACAGGCAA GTGCTACTGA GTACA 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TTGGCTACGC TTGGGCTATG GTAGTAGTTA TAGTT 35 

(2) INFORMATION FOR SEQ ID N0:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear ^ 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GGTGCTACCA TAGGGATTAA ATTATTCAAA AAGTT 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TACGAGCAAG GCTTCTTA 
X2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AGCTTAAGAA GCCTTGCTCG TAAACTTTTT GAATAATTT 
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(2) INFORMATION FOR SEQ ID NO: 13: 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 
)c) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

■ ' . ; (xi) SEQUENCE DESCRIPTION: SEQ' ID^ N0:13 : 
AATCCCTATG GTAGCACCAA CTATAACTAC TACCAT 

(2) INF0RMATI0N FOR SEQ ID N0:14: 

(i) SEQUENCE CHARACTERISTICS: 
.. ^ (A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi): SEQUENCE DESCRIPTION: SEQ ID N0:14: 
AGCCCAAGCG TAGCCAATGT ACTCAGTAGC ACTTG 

( 2 ) INFORMATION FOR SEQ ID NO : 15 : 

' (i) SEQUENCE CHARACTpiSTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE:, nucleic acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CCTGTAAACT ATTGAATGCA GCCTTAGCAG GGTC 

(2) INFORMATION FOR SEQ ID NO: 16: 
ri) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 16 base pairs 
/R'\ TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:16; 
ATCGCCTTCA GCCTAG 

(2) INFORMATION FOR SEQ ID NO: 17: 

fi^ SEQUENCE CHARACTERISTICS:. 
^ ^ (A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: lin ar 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CTCGAATTCG TACATCCTGG TCATAGC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CATTTTTGCA GATGGCTTAG A 



(2) INFORMATION FOR SEQ ID NO; 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID. N0;19: 
TAGCATTAAC GTCCAATA % 



(2) INFORMATION FOR SEQ ID NO: 20: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ATATATTTTA GTAAGCTTCA TCTTCT 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid ' 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21; 
GACAAAGAAC GCGTGAAAAC TTT 
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(2) INFORMATION FOR. SEQ ID NO: 22: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH:. 35 base pairs ■ .■ . 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

" ' " (xi) SEQUENCE" DESCRIPTION: SEQ ID -NO:22:., : . . — - ■ 

GCGGGCCTCT TCGCTATTGC TTAAGAAGCC TTGCT 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 48 base pairs ... 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single ' 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TTCAGCCTAG GATCCGCCGA CCTCTCCTAC CTGCGAATTC GTACATCC 

(2) INFORMATION FOR SEQ ID NO: 24: 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs ^ . . 

(B) TYPE: nucleic acid 

(C) STRANPEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
TGGATTATAC TTCTAAATAA TGGA 

(2) INFORMATION FOR SEQ ID NO: 25: ! 

(±) SEQUENCE CHARACTERISTICS: 
' ' (A) LENGTH: 36 base pairs ; 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
TAAGACTCAT TCCGGATGGA ATTCTGGAGT.CTGGGT 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE:, nucleic acid 

(C) STRANDEDNESS: single .. , 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



AATTCGCCAA GGAGACAGTC AT 



22 



(2) INFORMATION FOR SEQ ID NO: 27: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



. ..vXxi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 



AATGAAATAC CTATTGCCTA CGGCAGCCGC TGGATTGTT 



39 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28; 
ATTACTCGCT GCCCAACCAG CCATGGCCGA GCTCGTGAT . 39 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TCTAGAACGC GTC 13 



(G) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



GACCCAGACT CCAGATATCC AACAGGAATG AGTGTTAAT 
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(2) INFORMATION FOR SEQ ID NO: 31,: 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base. pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

Cxi) SEQUENCE DESCRIPTION: "sEQ" ID NO: 31 f ' 
ACGTGACGCG TTCTAGAATT AACACTCATT CCTGT . • 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 
■ (A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
TGGATATCTG GAGTCTGGGT CATCACGAGC TCGGCCATG- 

(2) INFORMATION FOR SEQ ID NO: 33: 

( i ) SEQUENCE CHARACTERISTICS : 
^ (A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: . 
GCTGGTTGGG CAGCGAGTAA TAACAATCCA GCGGCTGCC 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 37 base pairs 
fB) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
* (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
GTAGGCAATA GGTATTTCAT TATGACTGTC CTIGGCG 

(2) INFORMATION FOR SEQ ID. NO: 35: 

as SEQUENCE CHARACTERISTICS: 
^ (A) LENGTH: 30 bas pairs . 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35 
TGACTGTCTC CTTGGCGTGT GAAATTGTTA 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



.-..^(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TAACACTCAT TCCGGATGGA ATTCTGGAGT CTGGGT 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CAATTTTATC CTAAATCTTA CCAAC 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CATTTTTGCA GATGGCTTAG A 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid' 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CGAAAGGGGG GTGTGCTGCA A 
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(2) INFORMATION FOR SEQ ID NO: 40: 

(iV SEQUENCE CHARACTERISTICS: . 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) ' TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID,n6:40: 
TAGCATTAAC GTCCAATA 

(2) INFORMATION FOR SEQ. ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

■■--■■■i (A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:41: 
AAACGACGGC CAGTGCCAAG TGACGCGTGT GAAATTGTTA TCC 

(2) INFORMATION FOR SEQ ID N0:42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:42: 
GGCGAAAGGG AATTCTGCAA GGCGATTAAG CTTGGGTAAC GCC 

(2) INFORMATION FOR SEQ ID N0:43: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: , SEQ ID NO: 4 
GGCGTTACCC AAGGTTTGTA CATGGAGAAA ATAAAG 
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(2) INFORMATION FOR SEQ ID N0:4A: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: 5;ingle 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N'0:44: 
TGAAACAAAG CACTATTGCA CTGGCACTCT TACCGTTACC GT 42 

(2) INFORMATION FOR SEQ ID NO: 45: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 
TACTGTTTAC CCCTGTGACA AAAGCCGCCC AGGTCCAGCT GO 42 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 
TCGAGTCAGG CCTATTGTGC CCAGGGATTG TACTAGTGGA TCCG 44 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ, ID NO:47: 
TGGCGAAAGG GAATTCGGAT CCACTAGTAC AATCCCTG .38 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:48 
GGCACAATAG GCCTGACTCG AGCAGCTGGA CCAGGGCGGC 



(2) INFORMATION FOR SEQ ID NO: 49: 

' (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid. 

, (C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:49:' 
TTGtCACAGG GGTAAACAGT AACGGTAACG GTAAGTGTGC CA 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID Nb:50: 
GTGCAATAGT GCTTTGTTTC ACTTTATTTT CTCCATGTAC . 

(2) INFORMATION FOR SEQ ID N0:51: — 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:51: ; 
TAACGGTAAG AGTGCCAGTG C 
(52) INFORMATION FOR SEQ ID N0:52: 

(L) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 68 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear ■ 

^^^^ ^(A^^NAME/^ 

\l\ "fmm^^^ "M REPRESENTS AN EQUAL ^ 

'^m^'^T^D'^C AT THIS LOCm^^^^ 

LOCATIONS 28. 31, 34, 37. 40, 43,- 46 & 49 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 
AGCTCCCGGA TGCCTCAGAA GATGMNNMNN MNNMNNMNNM NNMNNKNNMN NGGCTTTTGC ,60 

68 

CACAGGGG 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: ^ 

(A) NAME/KEY: misc_dif ference 

[I] S?StI'?MnSNP/notei "M- REPRESENTS AN EQUAL 
SxfuRE A AND C AT THIS I^CATION AND AT 
LOCATIONS 20, 23. 26. 29, 32, 35, 38, Al. 44 6. 50 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
CAGCCTCGGA TCCGCCMNNM NNMNNMNNMN NMNNMNNMNN MNNMNNATGM GAAT ■ 54 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:54: 
GGTAAACAGT AACGGTAAGA GTGCCAG 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: . 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION:, SEQ ID NO: 55: 

19 

GGGCTTTTGC CACAGGGGT 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single - 

(D) TOPOLOGY: lin ar 
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• fxiV SEQUENCE description: SEQ. ID NO:56: 

^rT^CAGCTC CGGATCCCTC AGAAGTCATA • -^ACCCCCCAT AGGCmTGC 
AGGGTCATGG CCTTCAGCTC CGOAiov. 

CAC 

(2) INFORMATION FOR SEQ ID NO: 57: 

ClV SEQUENCE CHARACTERISTICS: 

- • - (Ay lENGTH: 47 ^base pairs- ^ - ■ - ■■ 

h) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY:, linear 

• (xi) SEQUENCE DESCRIPTION: SEQ ID NG: 57: . . . ■ ■ 

TCG^CTTCAG CTCCCGGATG CCTCAGAAGC ATGAACCC^^^ 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
. (D) TOPOLOGY: linear 

, (xi). SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

.CAATTTTATC CTAAATCTTA CCAAC 
(2) INFORMATION FOR SEQ ID N0:59: . v 
fi) SEQUENCE CHARACTERISTICS: 
^ (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59:, . . ^ 

GCCTTCAGCC TCGGATCCGC C 

(2) INFORMATION FOR SEQ ID NO: 60: 

(t) SEQUENCE CHARACrpiSTICS: 

(A) l^GTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:60:- ., ' • 
CGGATGCCTC AGAAGCCCCN N ' V . 
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(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
(C^ STRANDEDNESS : single 

. (D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:61: 
CGGATGCCTC AGAAGGGCTT TTGCCACAGG 
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I CLAIM: 



' 1 A composition of matter comprising a .. 

•plurality ot cell, containing a diverge population^^f 

expressible oligonucleotides operationally ^^^'^^ .. 

expression elements, said expressible oligonucleotides 
5 having a desirable bias of random codon sequences . 

produced from random combinations of first and second^ 

Oligonucleotide precursor populations havxng a desxrable 

bias of random codon sequences.. 

2 The composition of claim 1, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

3 The composition of claim 1, wherein the 
desirable bias of random codon sequences of said fxrst 
and second oligonucleotides is biased toward a 
predetermined sequence. 

4- The composition of claim 1, wherein said 
first and second oligonucleotides having random codon 
sequences have at least one specified codon at a 
predetermined position. 

5. The composition of claim 1, wherein said 
cells are procaryotes. 

6. The composition of claim 1, wherein said 

cells are ?■» coli- 
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7. A kit for the preparation of vectors useful 
for the expression of a diverse population of random, 
peptides from combined first and second oligonucleotides, 
having a desirable bias of random codon sequences, 
5 comprising: two vectors: a first vector having a cloning 
site for said first oligonucleotides and a pair of 
restriction sites for operationally combining first 
oligonucleotides with second oligonucleotides; and a 
.second vector having a cloning site for said second 
10 oligonucleotides and a pair of restriction sites 

complementary to those on said first vector, one or both 
vectors containing expression elements capable of being' 
operationally linked to said combined first and second 
oligonucleotides. 

8. The kit of claim 7, wherein said vectors 
are in a filamentous bacteriophage. 

9. The kit of claim 8, wherein said 
filamentous bacteriophage are M13 . 

10. The kit of claim 7, wherein said vectors 
are plasmids. 

11. The kit of claim 7, wherein said vectors 
are phagemids. 

12. The kit of claim 7, wherein the desirable 
bias of random codon sequences of said first and second 
oligonucleotides is xinbiased. 

13. The kit of claim 7, wherein the desirable 
bias of random codon sequences of said first and second, 
oligonucl otides is diverse but biased toward a 
predetermined sequence. 
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14. The kit of claim 7, wherein said first and 
second oligonucleotides having a desirable bias of random 
codon sequences have at least one specified codon at a 
predetermined position. 

15. The kit of claim 7, wherein said pair of 
restriction sites are. Fok I. 

16. A cloning system for expressing random ■ 
•peptides from diverse populations of combined' first and 

second oligonucleotides having a desirable bias of random 
codon sequences, comprising: a set of first vectors 
having a diverse population of first oligonucleotides 
having a desirable bias of random codon sequences and a 
set of second vectors having a diverse population of 
second oligonucleotides having a desirable bias of random 
codon sequences, said first and second vectors each 
haying a pair of restriction sites so as to allow the 
operational combination of first and second 
oligonucleotides into a contiguous oligonucleotide having 
a desirable bias of random codon sequences. 

17. The cloning system of claim 16, vherein 
the desirable bias of random codon sequences of said 
first and second oligonucleotides is unbiased. 

18. The cloning system of claim 16, wherein 
the desirable bias of random codon sequences of said 
first and second oligonucleotides is diverse but biased- 
toward a predetermined sequence. 

19. The cloning system of claim 16, wherein 
said first and second oligonucleotides having a desirable 
bias of random codon sequ noes have at least one 
specified codon at a predetermined position. 
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20. The cloning system of claiin 16, wherein 
said combined first and second vectors is through a pair 
of restriction sites. 

21. The cloning rystem of claim 16, wherein 
said pair of restriction sites are Fok I. 

22. A composition of matter comprising a 
plurality of cells containing a diverse population of 
expressible oligonucleotides operationally linked to 
expression elements, said expressible oligonucleotides 

5 having a desirable bias of random codon sequences. 

23. The composition of claim 22, wherein said 
cells are procaryotes. 

24. The composition of claim 22, wherein said 
expressible oligonucleotides are expressed as peptide 
fusion proteins on the surface of a filamentous 
bacteriophage. 

25. The composition of claim 22, wherein said 
filamentous bacteriophage is M13 . 

26. The composition of claim 22, wherein said 
fusion protein contains the product of gene VIII. 

27. The composition of claim 22, wherein said 
diverse population of oligonucleotides having a desirable 
bias of random codon sequences are produced from the 
combination of diverse populations of first and second 

5 oligonucleotides having a desirable bias of random codon 
sequences. 



wo 92/06176. 



PCr/US91/07l4l 



110 



28. 



The composition of claim': 22, wherein the ■ 
desirable bias of random codon sequences of ■ said 
oligonucleotides is unbiased. 

..^ . . 29 , - The composition of claim 22 , wherein the ;^ 

desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence: . 

30, The composition of claim 22 wherein said 
oligonucleotides having a desirable bias of random codon ■ 
sequences have at least one specified codon at a 
predetermined position. 

31 A plurality of vectors containing a 
diverse population of expressible oligonucleotides, having 
a desirable bias of random codon sequences. 

32 The vectors of claim 31, wherein said 

oligonucleotides are expressible as fusion proteins on 
the surface of filamentous bacteriophage. 

33. The vectors of claim 31, wherein said 
filamentous bacteriophage is Mi3. 

34. The vectors of claim 31, wherein said 
fusion protein contains the product of gene VIII. 

35. The vectors. of claim 31, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 

36. The vectors of claim- 31, wherein the 
desirable bias of random codon sequences of said 
oligonucl otides is diverse but biased toward a 
predetermined sequence. 
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37. The vectors of claiia 31, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

38 • A composition of matter, comprising a 
diverse population of oligonucleotides having a desirable 
bias of random codon sequences produced from random 
combinations of two or more oligonucleotide precursor 
5 populations having a desirable bias of random codon 
sequences. 

39. A method of constructing a diverse 
population of vectors having combined first and second 
oligonucleotides having a desirable bias of random codon 
sequences capable of expressing said combined 
5 oligonucleotides as random peptides, comprising the steps 
of: 

(a) operationally linking sequences from a 
diverse population of first 
oligonucleotides having a desirable bias 

10 of random codon sequences to a first 

vector; 

(b) operationally linking sequences from a 
diverse population of second 
oligonucleotides having a desircible bias 

15 of random codon sequences to a second 

vector ; and 

(c) combining the vector products of steps (a) 
and (b) under conditions where said 

^ populations of first and second 

20 oligonucleotides are joined together into 

a population of combined vectors capable 
of being expressed. 
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40 The method of clalin 3 9, , wherein the 
desirable bias of random codon sequences of said f xrst ^ 
and second oligonucleotides is unbiased. , 

- ■ 4irWiethod bt claim 39,- wherein- the, 

desirable bias of random codon sequences of ^^^^^^"^ , 

•rsecond Oligonucleotides is diverse but biased toward 
a predetermined sequence. 

42 The method of claim 39, wherein said first 
artd second oligonucleotides having a desirable bias of ^ 
random codon se^ences have at least one specxf.ed codon 
at a predetermined position. 

■' 43. The method of claim 38, wherein steps (a) 
through (c) are repeated two or: more times. 
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44. A method of selecting a peptide capable of 
being bound by a ligand binding protein from a population 
of random peptides, comprising : 

(a) operationally linking a diverse population 
5 of first oligonucleotides having a 

desirable bias of random codon sequences 
to a first vector; 

(b) operationally linking a diverse population 
of second oligonucleotides having a 

10 desirable bias of random codon sequences 

to a second vector; 

(c) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 

15 oligonucleotides are joined together into 

a population of combined vectors; 

(d) introducing said population of combined 
vectors into a compatible host under 
conditions sufficient for expressing said 

20 population of random peptides; and 

(e) determining the peptide which binds to 
said ligand binding protein. 

45; The method of claim 44, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

46.- The method of claim 44 r wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is diverse but biased toward 
a predetermined sequence. 
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47 The method of claim 44, wherein said first- 
and second Oligonucleotides having a desirable bias of ^ 

random codon sequences have at -least one specxfxed codon • 
at a predetermined position. 

48.._. The method of cliim 44, wherein steps (a) 

• through (c) are repeated two or more times. 
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49. A inethod for determining the nucleic acid 
sequence encoding a peptide capable of being bound by a 
ligand binding protein which is selected from a 
population of random peptides, comprising; 

5 (a) operationally linking a diverse population 

of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

. .. ^ . 

(b) operationally linking a diverse population 
10 of second oligonucleotides having a 

desirable bias of random codon sequences 
to a second vector; 

(c) combining the vector products of steps (a) 
and (b) under conditions where said 

15 populations of first and second 

oligonucleotides are joined together into 
a population of combined vectors; 

(d) introducing said population of combined 
vectors into a compatible host under 

20 conditions sufficient for expressing said 

population of random peptides; 

(e) determining the peptide which binds to 
said ligand binding protein; 

(f) isolating the nucleic acid encoding said 
25 peptide; and 

(g) sequencing said nucleic acid. 
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. 50. The method. of claim 4.9, wherein the 
desirable bias of random codon sequences of said: first 
and second" oligonucleotides is unbiased. 

. . ... . . . . The .method of qlaim AS , wherein ' • ; 

desirable bias of random codon sequences of said first 
and second oligonucleotides is diverse but biased toward 
a predetermined sequence. • 

52. The method of claim 49, wherein said first 
and second oligonucleotides having a desirable bias of 
random codon sequences have at least one specified codon 
at a predetermined position. 

53. The method of claim 49, wherein steps (a) 
through (c)., are repeated two or more times. 

■ ■ 54. A method of constructing a diverse 
population of vectors containing expressible 
oligonucleotides having a desirable bias of random codon 
sequences, comprising operationally linking a diverse 
population of oligonucleotides having a desirable b.as of 
random " codon sequences to expression elements . 

55. The method of claim 54, wherein said 
oligonucleotides are expressible as fusion proteins on 
the surface of filamentous bacteriophage. 

56. The method of claim 54, wherein said 
filamentous bacteriophage are M13 . 

57. The method of claim 54 , wherein said 
fusion protein contains the product of gene VIII. 
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58.. The method of claim 54, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 

59. The method of claim 54, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 

60. The method of claim 54, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

61. The method of claim 54, wherein said 
operationally linking further comprising the steps of: 

(a) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

(b) operationally linking a diverse population 
of second oligonucleotides having a 
desirable bias of random codon sequences 
to a second vector; and 
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(c) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
oligonucleotides are joined together into 
3^5 a population of comibined vectors. 

62. The method of claim 61, wherein steps (a) 
through (c) are repeated two or more times. 
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63. A method of selectir^g a. peptide capable of 
being bound by a binding protein from a population of : 
random peptides, comprising: 

(a) operationally linking a diverse population 

- : - of oligdHucIeotiaes having a desirable 

bias of random codon sequences to 
expression elements ; 

introducing said population of vectors 
into a compatible host under conditions 
sufficient for expressing said population 

of random peptides ; and 

(c) determining the peptide which binds to 
said ligand binding protein. 

64. The method of claim 63, wherein said 
population of random peptides are expressed as fusion 
proteins on the surface of filamentous bacteriophage. 

65. The method of claim 63, wherein said 
filamentous bacteriophage are M13. 

66. The method of claim 63, wherein said 
fusion protein contains the product of gene VIII. 

67. The method of claim 63, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 

68 . The method, of claim 63, wherein the 
desirable bias of random codon sequences of said 

oligonucleotides is diverse but biased toward a 

predetermined sequence. 
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69. The method of claim 63, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

.70. The method of claim 63, wherein step (a) 
further comprises: 

(al) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

(a2) operationally linking a diverse population 
of second oligonucleotides having a 
desirable bias of remdom codon sequences 
to a second vector; and 

(a3) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
oligonucleotides are joined together into 
a population of combined vectors. 
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71i The method of claim 70, wherein steps (al) 
(a3) are repeated two or more times. 
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72 • A method Of determining the nucleic acid 
sequence encoding a peptide capable of being bound by a 
ligand binding protein which is selected from a 
population- of random peptides, comprising: 

' ^ "(a) operationally linking a diverse population 
of oligonucleotides having a desirai?le 
bias of random codon sequences to. 
expression elements. 

(b) introducing said population of vectors 
into a compatible host under conditions 
sufficient for expressing said population 
of random peptides; 

(c) determining the peptide which binds to 
said ligand binding protein; 

(d) isolating the nucleia acid encoding said 
peptide; and 

(e) sequencing said nucleic acid. 

73 The method of claim 72, wherein said 
population of random peptides ar^ expressed as fusion 
pr^eins on t^e surface of filamehtous bacteriophage. 

74. The method of claim 72, Wherein said 
filamentous bacteriophage are M13. 

75 The method of claim 72, Wherein said 
fusion protein contains the product of gene VIII. 

76 The method of claim 72, wherein the 
desirable bias of random codon sequ nc s of said 
oligonucleotides is unbiased. 



wo 92/06176 



PCr/US91/07141 



121 

77. The method of claim 72, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 

78. The method of claim 72, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. . 

79. The method of claim 72, wherein step (a) 
further comprises : 

(al) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

operationally linking a diverse population 
of second oligonucleotides having a 
desirable bias of random codon sequences 
to a second vector; and 

(a3) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
oligonucleotides are joined together into 
a population of combined vectors. 

80. The method of claim 78, wherein steps (al) 
through (a3) are repeated two or more times. 

81. A vector comprising two copies of a gene 
encoding a filamentous bacteriophage coat protein> both 
copi s encoding substantially the same amino acid 
sequence but having different nucleotide sequences. 
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82. . The vector of claim 81, wherein said 
filamentous bacteriophage is M13. 

. 83 . The vector of claim 81, wherein said, gene 

is gene VIII. 

84 The vector of claim 81, wherein said 
vector has substantially the sequence shown in Figure 5 
(SEQ ID NO: 1) - 

85 . A vector comprising two copies o£ a gene 

encoding a *iia..ntoas .actericpbage "'^l;^^^^ . 
copy of saia gene capable of being operatlonally^linXed 
an Oligonucleotide wherein said oligonucleot.de can be 
5 expressed as a fusion protein on the surface of said 
filamentous bacteriophage or as a soluble peptide. 

86. The vector of clai.n 84, wherein said one 
• copy Of said gene is expressed on the surface of sa.d ^ 

, filamentous bacteriophage. 

87. The vector of claim 84, wherein said 
bacteriophage coat protein is M13 gene VIII. 
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-1 AATGCTAGTA CTATTA6TAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAiJ 60 
61 ATAGCTAAAC AG6TTATTGA CCATTTGCGA AATGTATCTA ATG6TCAAAC TAAATCTACT 120 
121 eGTTCGCAGA ATTG6GAATC AACTGHACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 2S0 
I'll TCTGCAAAAA ibACCTCTTA TCAAAAGGAG CAATTAAA6G lACTCTCTAA TCCTGACCTG 300 
301 TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTC6AA TTAAAACGCG ATATTTGAAG 360 
361 TCTTTCG6GC TTCCTCTTAA TCTTTTTGAT GGAATCCGCT TTGCTTCTGA CTATAATAG.T ^20 
m CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGJ TTTCTGAAGT 6TTTAAA6CA ^80. 
^81 TTT6AG6GGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCA6TGT 5^*0 
l^i ^AACATTTTA CTATTACCCC GTCTG6GAAA ACTTGTTTTG CAAAAGCCTG TCGCTATTTT. 600 
601 GGTTTTTATC GTCGTCTGGT AAACGA66GT TATGATAGT6 TTGCTCTTAC TATGCCTCGT 660 
661 AATJCCTTTT G6CGTTATGT ATCTGCAHA GTTGAATGTG 6TATTCGTAA ATCTCAACTG 720 
721 ATGAATCTTT CTAGCTGTAA TAATGTTGTT GGGTTA6TTC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTGCT6ACT6 6TATAATGAG GCAGTTCTTA AAATGGCATA AGGTAAHGA 810 
841 CAATGATTAA AGTT6AAATT AAACCATCTG AAGGCGAAT.T TACTACTCGT TCTGGTGTTT 900. 
§?l ^^^^9^JW. TCACTGAATG AGGAGCTTTG TTACGTTGAT TTGGGTAATC- 960 

;961-AATATCCG6T TCTTGTCAAG AHACTGITG ATGAAGGTCA GGCAGCCTAT GCGCGTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG TT6GTCA6TT CGGTTCCCn ATGATTGAGG 1080 
1081 6TCTGCGCCT CGTTCCGGCT AAGTAACAT6 6AGGAGGTCG CG6ATTTC6A CACAATTTAT mO 
mi CAGGCGATGA TACAAATCTC CGTTGTAGTT TGTTTGGG6G TTGGTATAAT GGGTGGGGGT 1200 
1201 CAAAGATGAG T6TTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
1251 6TGGCATTAC GTATTTTACG CGTTTAATGG AAAGTTCGTG ATGAAAAAGT GTTTAGTGCT 1320 
1321 CAAAGCCTCT GTAGGCGHG CTACGGTGGT TCGGATGGTG TCTTTGGCTG GTGAGGGTGA 1580 
1381 CGATCCCGCA AAAGCGGGCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGHA im 
l^fiMGCGTGGGGG ATGGTTGTTG TGATTGTCGG CGGAAGTATG GGTATCAAGC TGTTTAAGAA 1500- 
1501 ATTCACCTCG AAAGGAAGGT GATAAACGGA TAGAATTAAA GGGTGCTTn GGAGCGnTT 1560 
1561 TTTTTGGA6A TTTTCAACGT 6AAAAAATTA TTAnCGCAA TTCCTTTAGT TGnCCTHG 1520 
1621 TATTCTCAGT CC6GTGAAAG TGTTGAAAGT TGTTTAGGAA AAGCCGATAC AGAAAATTCA 1580 
1681 TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT AC6CTAACTA T6AGGGTT6T 17^10 
mi CT6T6GAATG GTAGAGGCGT TGTAGTTTGT AGTGGTGAGG AAACTCAGTG TTACGGTAGA 1800 
1801 T6GGTTCCTA TTGGGCTT6C TATGCCTGAA AATGAGGGT6 6TG6CTCTGA GGGTGGGGGT 1860 
.1861 TCT6A66GTG GCGGTTGTGA GGGTGGGGGT ACTAAAGCTG CTGAGTAGGG TGATACACCT 1920 
1921 ATTCCG6GCT ATAGTTATAT GAACGGTGTC GAGGGGACTT ATGCGGGTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCGTAATCC HCTGTTGAG GAGTGTGAGC CTGTTAATAC TTTCATGrTT 20a0 
20'»1 CA6AATAATA G6TTCCGAAA TAGGGAGGGG GCATTAACTG HTATACGGG GAGTGTTAGT 2100 
2101 CAAGGCAGTG AGGGCGTTAA AAGTTATTAG GAGTAGACTC GTGTATCATG AAAAGGCATG 2160 
2161 TATGAGGGTT ACTGGAACGG TAAATTGAGA GAGTGGGCTT TGGATTCTGG GnTAATGAA 2220 
2221 GATCGATTGG TTTGTGAATA TCAAGGCGAA TCGTCTGAGC TCGGTCAACC TCGTGTCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGG6TGGTGG CTCTGAGGGT 23'»0 
23'»1 GGGGGTTGTG AGGGTGGCGG GTGTGAGGGA GGGGGHCGG GTGGTGGCTC TGGTTCCGGT 2400 
2'JOl GATHTGAH ATGAAAAGAT GGGAAACGGT AATAAGGGGG GTATGACGGA AAATGCCGAT 2^150 
2'»61 GAAAAGGGGC TACAGTCTGA G6GTAAAGGG AAAGHGAH GTGTGGCTAC TGAHACGGT. 2520 
2521 GGTGGTATGG ATGGHTCAT TGGTGACGTT TGGGGGGTTG CTAATGGTAA T6GT6CTACT -2580 
2581 GGTGATriTG CTGGCTCTAA TTGGCAAATG GCTCAAGTCG GTGAGGGTGA TAATTCACCT 2610 
2611 TTAAT6AATA AHTCCGTCA ATAHTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 
2701 TTTGTGTTTA GGGCTGGTAA ACGATATGAA TTTTCTAnG ATTGTGACAA AATAAACTTA 2760 
2761. nCCGTGGTG TGTHGCGH TCHTTATAT GnCCCACCT HATGTATGT ATTTTCTAC6 2820 
2821 TTTGCTAACA TACT6CGTAA TAAG6AGTCT TAATCATGCC AGTTCTTHG GGTAHCCGT 2880 
2881 TAHATTGGG HTCGTCGGT TTGGTTCTGG TAACTTTGTT GCGGTATCTG CnACTTTTC 2910 
2911 TTAAAAA66G CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCnGCT CnATTAHG 3000^ 
3001 GGCHAAGTC AATTCTTGTG GGTTATCTCT GTGATAHAG CGCTCAAHA CCCTCTGACT 3060 
3061 TTGTTCAG6G TGTTCAGTTA ATTCTCCCGT GTAATGG6CT TCGCTGTTTT TATGHATTG 3120 
3121 TCTCTGTAAA GGCTGCTAH TTCATTTTTG ACGHAAACA AAAAATCGH TCnATriGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTnATTTT GTAACTGGCA AAHAGGGTC TGGAAAGACG 3210 
3211 GTGGHAGGG HGGTAAGAT TGAGGATAAA AHGTAGCTG GGTGGAAAAT AGGAAGTAAT 3300 
3301 GGTGATHAA GGGHCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 
3361 CTTAGAATAG GG6ATAAGCG TTGTATATCT GATTTGGTTG CTAnGGGGG GGGTAATGAT 3120 
3121 -TCCTAGGATG AAAATAAAAA CGGCnGCTT GnCTeGATG AGTGCGGTAC TTGGTrTAftT 3180 
3181 AGCCGHGTT GGAATGATAA GGAAAGACAG GCGATTAnG ATTGGTHGT ACATGCTC6T 3510 
3511 AAATTAG6AT GGGATATTAT CTTGCTTGTT CAGGACHAT CTATTGTTGA TAAACAGGCG 3600 
3601 CGTTCTGCAT TAGCTGAACA TCTTGHTAT TGTGGTGGTC TGGAGAGAAT TACTHACCT 3660 
3551 TTTGTGGGTA GTTTATATTG TGTTATTAGT GGGTG6AAAA TGCGTGTGGG TAAATTACAT 3720 . 
3721 GTTGGGGHG TTAAATATGG GGATTGTGAA HAAGGGGTA CTGTTGAGGG TTGGGTTTAT 3780 
3781 ACTG6TAAGA ATTT6TATAA GGGATATGAT AGTAAAGAGG GTTrTTGTAG TAATTAT6AT 3810 
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1 AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 
61 ATAGCTAAAC AGGTTATT6A CCATHGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 C6TTCGCAGA ATTG6GAATC AACT6TTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGAHC AGCAAHAAG CTGTAAGCCA 2^*0 
2^11 TCTGGAAAAA T6ACCTCTTA TCAAAAGGAG CAATTAAA6G TACTCTCTAA TCCTGACCTG 300 
301 TTGGAGTTTG CTTCCGGTCT GGTTC6CTTT 6AAGCTCGAA TTAAAACGCG ATATTTGAAG 360 
361 TCTTTCGGGC TTCCTCTTAA TCTTTJTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT ^20 
m CAGGGTAAAG ACCT6ATTTT TGATTTATGG FCATTCTCGT TTTCTGAACT GTTTAAAGCA m 
m TTT6AGGGGG ATTCAATGAA TAHTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 5^0. 
Sm AAACATTTTA CTAHACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600' 
601 GGTTTTTATC 6TCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGGCTCGT 660 
561 AATTCCTTn GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720: 
721 ATGAATCTTT CTACCTGTAA TAATGTTGn CCGTTAGTTC 6TTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACT6 GTATAAT6AG CCAGTTCTTA AAATCGCATA AGGTAAHCA 8^0 
8^1 CAATGAHAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT nGGGTAATG 960 
961-AATATCCGGT TCTTGTCAA6 ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 
1021 T6TACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGHCCCn ATGATTGACC 1080 
mi GTCTGCGCCT CGHCGGGCT AAGTAACATG GAGCAGGTC6 C66ATTTCGA CACAATTTAT 11^0 
1111 CAG6CGATGA TACAAATCTC CGTTGTACTT TGTTTCGGGC TTGGTATAAT CGCTGGGGGT 1200 
1201 CAAAGATGAG TGHTTAGIG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
1251 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGAT6CTG TCTTTCGGTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAA6CGGCCT TTAACTCCCT GCAA6CCTCA 6CGACC6AAT ATATCGGTTA mO 
mi TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAA6AA 1500 
1501 ATTCACCTC6 AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 
1551 TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCniC 1620 
1621 TATTCTCACT CCGCTGAAAC TGTT6AAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 
1681 THACTAACG TCTGGAAAGA C6ACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGHGT 17i|0 
17m CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCA6TG HACGGTACA 1800 
1801 T6G6TTCCTA nGGGCHGC TATCCCT6AA AATGAGG6TG GTGGCTCTGA GGGT6GC6GT 1860 
1861 TCT6AG6GTG GCGGHCTGA GGGTGGCGGT ACTAAACCTC CTGA6TACGG TGATACACCT 1920 
1921 ATTCC666CT ATACTTATAT CAACCCTCTC GACG6CACTT ATCCGCCTGG TACTGA6CAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGnT 2040 
2041 CAGAATAATA 66TTCCGAAA TAGGCAG6GG GCATTAACTG THATACGGG CACTGHACT 2100 
2101 CAA6GCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CT6TATCATC AAAAGCCATG 2160 
2161 TATGACGCn ACTGGAACG6 TAAATTCAGA GACTGCGCTT TCCATTCTGG CTHAATGAA 2220 
2221 GATCCAHCG THGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 6CTGGGGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGT6G CTCTGAGGGT 2340 
2341 GGCGGTTCTG AGGGTGGCGG CTCTGAGG6A GGCGGHGCG GTG6TGGCTC TGGHGCGGT 2400 
2401 GATTHGATT ATGAAAAGAT G6CAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 
2461 GAAAACGCGC TACAGTCTGA CGCTAAA6GC AAACnGATT CTGTCGCTAC TGAHACGGT 2520 
2521 GCTGCTATCG ATGGTTTCAT TGGTGACGn TCCG6CCTTG CTAATGGTAA TGGTGCTACT 2580 
2581 GGTGATnTG CTG6CTCTAA TTCCCAAAT6 GCTCAAGTCG GTGACGGTGA TAAHCACCT 2640 
2641 HAATGAATA ATHCCGTCA ATATTTACCT TCCCTCCCTC AATCG6TTGA ATGTCGCCCT 2700 
2701 THGTCTTTA GCGCT6GTAA ACCATATGAA TTTTCTATTG AHGTGACAA AATAAACHA ■ 2760 
2761 TTCC6T6GTG- TCTTTGCGTT TCHTTATAT GHGCCACCT TTATGTATGT ATTTTCTACG 2820 
2821 THGCTAACA TACTGCGTAA TAAG6AGTCT TAATCATGCC AGTTCTTnG GGTAHCCGT 2880 . 
2881 TATTAHGCG TTTCCTCG6T TTCCTTCTGG TAACTTTGn CGGCTATCTG CnACTlTTC 2940 
2941 TTAAAAAGGG CnCGGTAAG ATAGCTATTG CTATnCAH GTTTCnGCT CHAHAnG 3000 
3001 GGCHAACTC AAHCTTGIG GGHATCTCT CTGATATTAG CGGTCAAHA CCCTCTGACP 3060 
3061 TTGTTCAGGG TGnCAGTTA AHCTCCCGT GTAAT6CGCT TCGCT6TTTT TATGHAnC 3120 
3121 TCTCTGTAAA GGCTGCTAH TTCA1TTTT6 ACGTTAAACA AAAAATCGTT TCnATrTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTTTATnT- GTAACTGGCA AATTAGGCTC T66AAAGAGG 3240 
3241 CTCGTTA6GG HGGTAAGAT TTA66ATAAA AHGTAGGIG GGT6CAAAAT AGGAACTAAT 3300 
3301 CnGAnTAA 6GCTTCAAAA CCTCCCGCAA 6TC66GAGGT TC6CTAAAAC GCCTCGCGn 3360 
3361 CHAGAATAC CGGATAAGCC HGTATATGI GATTTGCnG CTAnGGGGG CG6TAATGAT 3420 
-3421 TCCTACGATG AAAATAAAAA CGGCHGCn GTTCTC6ATG A6T6C6GTAC TT6GTTTAAT 3480 
3481 ACCCGHGn GGAATGATAA G6AAAGAGAG GCGAnATTG AnGGTTTCT ACATGCTC6T 3540 
3541 AAAHAGGAT GGGATAHAT CnGCnGTT CAGGAGTTAT CTAHGHGA TAAACAGGCG 3600 
3601 CGTTCTGCAT TA6CT6AACA TGITGHTAT TGTCGTCGTC T6GACAGAAT TACTnACCT 3660 
3651 TTTGTC6GTA CTTTATAHC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAAHACAT 3720 • 
3721 GTTGGCGTTG HAAATATGG CGATTCTCAA TTAAGGCCTA CTGTTGAGCG TIGGCTTTAT 3780 
3781 ACTGGTAAGA AITTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAAHATGAT 3840 
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I in. I 20 I 30' \ ^0 I 50 I 60 
1 AAT6CTACTA CTATTAGTAG AATTGATGCC ACCTTTTCA6 CTCGCGCCGC AAATGAAAAT 60 
61 ATAGCTAAAC A66TTATT6A CCATTTGCGA AAT6TATCTA AT6GTCAAAC TAAATCTACT 120 
121 CGTTC6CAGA ATTG6GAATC AACTGHACA TGGAATGAAA CT.TCCAGACA CCGTACTTTA 180 
181 6TTGCATATT TAAAACATGT T6AGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 2l\0 
2^ TCTGCAAAAA TGACCTCTTA TCAAAA66AG CAATTAAAG6 TAgCTCTAA TCCTGACCTG 300 
foi TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAAC6CG ATATTT6AAS 360 
361 TGTTTCGG6C TTCCTCTTAA TCTTTTTGAT GCAATCC6CT TTGCTTCTGA CTATAATA6T ^20 
^21 CAG6GTAAA6 ACCTGATTTT T6ATTTATG6 TCATTCTC6T TTTCT6AACT GTTTAAAGCA m 
m TTTGAGGGGG ATTCAATGAA TATTTATGAC 6ATTCC6CAG TAHGGACGC TATCCAGTCT 5W 
stl AAACATTTTA CTATTACCCC CTCT6GCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 
601 6GTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTC6T 660 
661 AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACT6 720 
721 ATGAATCTH CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTA6ATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGHCTTA AAATCGCATA AGGTAATTCA 840 
sSl CAATGATTAA AGHGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCJGGTGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGGTTTG nACGTTGAT TT6GGTAAT6 960 
gei'SATATCCGGT TCHGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT 6CGCCTGGTC 1020 
1021 T6TACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGHCCCTT ATGA7TWCC 1080 
1081 6TCT6CGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CG6ATTTC6A CACAATTTAT IIJO 
llSl CAGGCGATGA TACAAATCTC CGHGTACTT TGTTTCGCGC TTGGTATAAT C6CT6GGGGT 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTHCG CCTCTnCGT TTTAGGTTGG JGCCTTCGTA 1260 
1251 GTGGCAHAC 6TATTTTACC CGHTAAIGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCi 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCT6 CT6AG6GTGA 1380s . 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA 6CGACCGAAT AJATCGGTTA I'm 
mi TGCGTG6GCG ATGGTTGnG TCAHGTCGG CGCAACTATC GGTATCAAGC JGTTTAAGAA 150Q;v 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA 66CTCCTTTT GGA6CCTTTT 156te 
1551 TTTTTGGAGA TTTTCAACGT GAAAAAATTA HATTCGCAA TTCCTTTAGT TGTTCCTTTC 1520[ 
1521 TAHCTCACT CCGCTGAAAC TGTTGAAAGT TGTnAGCAA AACCCCATAC AGAAAATTCA 1680?. 
1681 TTTACTAACG TCT6GAAAGA C6ACAAAACT nAGATCGTT ACGCTAACTA T6AGG6TTGT 17J0 
17?! CTGTGGAATG CTACAG6CGT T6TAGTTTGT ACTGGTGACG AAACTCA6TG JTACGGTACA 1800 
1801 TGGGnCCTA TTGGGCnGC TATCCCTGAA AATGAGGGTG GTG6CTCTGA GG6TGGCGGT I860.; 
1861 TCTGAGGGTG GC6GTTCTGA GGGTG6CGGT ACTAAACCTC CTGAGTACG6 TGATACACCT 1920. 
1921 ATTCCGGGCT ATACnATAT CAACCCTCTC GACGGCACH ATCCGCCT66 TACJGAGCAA 1980; 
1981 AACCCC6CTA ATCCTAATCC TTCTCHGAG GAGTCTCA6C CTCTTAATAC TTTCAT6TTT 2m 
205l CA6AATAATA GGHCCGAAA TAG6CAGGG6 GCATTAACTG TTTATACGG6 CACTGTTACT 2100^ 
2101 CAAGGCACTG ACCCC6TTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCAT6 2160?. 
2161 TATGACGCTT ACTGGAACG6 TAAATTCAGA GACTGCGCH TCCATTCT66 CTTTAATGAA 2220 
2221 GATCCATTCG HTGIGAATA TCAAGGCGAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCT66CGGC6 GCTCTGGTGG TGGnCTGGT GGCGGCTCTG A6GGTG6T6G CTCT^AGGGT 23A0 
23S1 G6CGGTTCTG AG6GTG6CG6 CTCTGAGGGA GGCGGHCCG 6TGGTG6CTC JGGJTCCGGT 25OO 
mi GATTTT6ATT ATGAAAA6AT GGCAAACGCT AATAA6GG6G CTATGACCGA AAATGCCGAT 2460 
2461 GAAAACGC6C TACAGTCTGA CGCTAAAGGC AAACnGATT CT6TC6CTAC T6ATTACGGT 2520 
2521 6CTGCTATCG ATGGTrTCAT TGGT6ACGTT TCCGGCCHG CTAAT6GTAA TGGT6CTACT 2580 
2581 GGTGATTTTG CTGGCTCTAA HCCCAAATG GCTCAAGTCG GT6AC6GTGA JAATTCACCT 2540 
2541 TTAAT6AATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 
2701 TTTGTCTTTA GCGCIGGTAA ACCATATGAA TTTTCTAnG ATTGTGACAA AATAAACTTA 2760 
2751 rrCCGTGGTG TCHTGCGn TCTTTTATAT GHGCCACGT nATGTATGT ATTTTCTAC6 2820 
2821 TTTGCTAACA TACT6CGTAA TAA6GAGTCT TAATCATGCC AGnCTTTTG G6TATTCC6T 2880 
2881 TATTATTGCG TTTCCTCGGT TTCCnCTGG TAACTHGn CGGCTATCTG CTTACTTTTC 2940 
2941 7TAAAAA66G CHCGGTAAG ATAGCTAHG CTATTTCATT 6TTTCTT6CT CTTATTATT6 3000 
3001 GGGTTAACTC AATTCHGIG GGnATCTCT CTGATAHAG C6CTCAATTA CCCTCTGACT 3060 
3051 TTGTTCA6GG TGTTCAGTTA AnCTCCCGT CTAATGCGCT TCCCTGTTTT TAT6TTATTC 3120 
3121 TCTCT6TAAA GGCTGCTAH nCATTnTG ACGHAAACA AAAAATC6TT TCTTATTT66 3180 
^IRI aHgrgATAA ATAATATGGC TGTTTATTTT GTAACT6GCA AATTAG6CTC TGGAAA6AC6 3240 
324 CTCGTTAGCG TTG6TAAGAT TCAG6ATAAA ATTGTAGCTG GGT6CAAAAT A6CAACTAAT 3300 
3301 CTTGATTTAA GGCHCAAAA CCTCCCGCAA GTCGGGA66T TC6CTAAAAC GCCTC6CGTT 3360 
3361 CTTA6AATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGeG CGGTAAT6AT 3420 
3421 TCCTAGGATG- AAAATAAAAA CGGC^^ GnCTGGATG AGTGCGGTAC nGGTTTAAT 3480 . 
3481 ACCG6TTCTT 66AAT6ATAA GGAAAGACAG CCGATTATT6 ATTGGTTTCT ACATGCTCGT 3540 
3541 AAATTAG6AT GGGATAHAT nTTCnGn CAGGACHAT CTATTGTTGA TAAACAGGC6 3600 
3501 C6TTCTGCAT TAGCTGAACA TGHGnTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 
3661 TTTGTC66TA CTTTATATTC TCHATTACT GGCTCGAAAA JGCCTCTGCC TAAATTACAT 3720, 
3721 GTTG6CGTTG HAAATATGG C6ATTCTCAA nAAGCCCTA CTGHGAGCG nGGCTTTAT 3780 
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I 10 I 20 I 30 \ HO i 50 I 60 
1 AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAAT6AAAAT 60 
51 ATA6CTAAAC AGGTTATTGA CCATHGCGA AATGTATCTA AT6GTCAAAC JAAATCTACT 120 
121 CGTTCGCAGA ATTGGGAATC AACTGHACA TG6AATGAAA CTJCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CKTAAGCCA 2^0 
2i|l TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TGCTGACCTG 300 
301 nGGAGTTTG CTTCCGGTCT GGTTCGCnr GAAGCTCGAA HAAAACGCG ATATTTGAA6 360 
361 TCTTTC6GGC TTCCTCHAA TCTTTTTGAT GCAATCGGCT TTGCTTCTGA CTATAATAGT J20 
^^21 CAGGGTAAAG -ACCTGATTTT TGATTTATGb TCAITCTCGJ TTTCTGAACT GTTTAAAGCA J80 
mi TTTGAGGGGG ATTCAATGAA TATHATGAC GAHCCGCAG JATTGGACGC TATCCAGTg SJO 
5^1 AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 
601 GGTTTTTATC GTCGTCTGGT AAACGAGGGT TAT6ATAGTG JTGCTCTTAC TAT6CCTC6T 560 
661 AATTCCHTT GGCGnATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACT6 720 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACT6 GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 
8m CAATGAHAA AGHGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTT6 TTAGGTTGAT TTGGGTAATG 960^ 
961-AATATCCGGT TCTTGTCAAG ATTACTCHG ATGAAGGTCA 6CCAGCCTAT GCGCCT6GTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTHCAAAG TTGGTCAGTT CGGnCCCTT ATGAJT6ACC 1080 
1081 GTCTGC6CCT C6TTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 11^0 
Hill CAGGCGATGA TACAAATCTC CGTTGTACn TGTnCGGGC TT6GTATAAT CGCTGGG66T 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG T6CCTTCGTA 1260 
1261 GIGGCAHAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCT6 TCTTTCGCTG CTGAGGGT6A 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGHA IWO 
1441 T6CGTGGGCG ATGGTTGTTG TCAHGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 AnCACCTGG AAAGCAAGCT GATAAACCGA TACAATTAAA 6GCTCCTTTT GGA6CCTTTT 1560 
1561 TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTAHCGCAA nCCTTTAGT T6TTCCTTTC 1520 
1621 TATTCTCACT CC6CTGAAAC TGTTGAAAGT TGrTTAGCAA AACCCCATAC AGAAAATTCA 1680 
1681 HTACTAACG TCTGGAAAGA CGACAAAACT HAGATCGIT ACGCTAACTA TGAGGGTTGT 1740 
1741 CTGTGGAATG CTACAGGCGT IGTAGTHGT ACTGGTGACG AAACTCAGTG JTACGGTACA 1800 
1801 TGGGTTCCTA TTG6GCTTGC TATCCCTGAA AATGAGGGTG GT6GCTCTGA GGGTG6GG6T 1860 
1861 TCTGAGGGTG GCGGHCTGA GGGTGGCGGT ACTAAACCTC GTGAGTACGG TGATACACCT 1920 
1921 ATTCCGGGCT ATACHATAT CAACCCTCTC GACGGCACH ATCC6CCTGG TACT6AGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC nCTCHGAG GAGTCTCA6C CTCTTAATAC TTTCATGTTT 2040 
2041 CAGAATAATA GGHCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGG6 CACT6TTACT 2100 
2101 CAAG6CACTG ACCCCGHAA AACHAHAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2161 TATGACGCn ACTGGAAGGG TAAAHCAGA GACTGCGCH TCCATTCTGG CTTTAATGAA 2220 
2221 GATCCAHCG THGIGAATA TCAAGGCCAA TCGTCTGACC JGCCTCAACC TCCT6TCAAT 2280 
2281 6CTGGCG6CG C6TCTGGTGG TGGnCTGGT 6GCGGCTCTG AGG6TGGTGG CTCTGA6GGT 2340 
2341 GGCGGHCTG A6GGTGGCG6 CTCTGAGGGA 6GCGGTTCCG GTGGTGGCTC TGGTTCCG6T 2400 
2401 GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG GTATGACCGA AAATGCC6AT 2460 
2461 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACnGATT CTGTCGCTAC TGATTAC66T 2520 
2521 GCTGCTATCG ATGGTTTCAT TGGTGACGH TCCGGCCHG CTAATG6TAA TGGTGGTACT 2580 
2581 GGTGATrrTG CTGGCTCTAA HCCCAAATG GCTCAAGTC6 6T6AC66TGA TAATTCAGCT 2640 
2641 HAATGAATA ATTTCC6TCA ATATTTACCT TCCCTCCCTC AATCGG7T6A ATGTCGGCCT 2700 , 
2701 TrTGTGTTTA GGGCTGGTAA AGCATATGAA TTTTGTAnG AHGTGACAA AATAAACTTA 2760 
2761 frCCGTGGTG TGTTTGGGn TGTnTATAT GnGGCACGT TTATGTATGT ATHTCTACG 2^ 
2821 TTTGGTAAGA TAGTGCGTAA TAAGGAGTGT TAATCATGGC AGnGTTTTG 6GTATTCCGT 2880 
2881 TATTAnGGG TriGGTCGGT nGGHGTGG TAAGHTGIT CGGGTATGTG.CnAGTmC 2^ 
2941 TTAAAAAGGG CHCGGTAAG ATAGCTAHG CTATnCAn GmcnGCT CTTATTATTG 3000 
3001 GGCTTAACTG AAnGTTGTG GGTTATGTGT GTGATAHAG GGGTGAATTA CCCTCTGACT 3060 
3061 TTGTTCA66G TGTTCAGnA AHCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 
3121 TCTCT6TAAA GGCTGCTAH nCATTTTTG ACGHAAACA AAAAATCGTT TCTTATTT6G 3180 
3181 ATT6GGATAA ATAATATGGC TGTnATTTT GTAACTGGCA AAHAGGgC TGGAAAGAC6 ^ 
3241 CTCGHAGCG HGGTAAGAT HAGGATAAA AHGTAGCTG G6T6CAAAAT A6CAACTAAT 3300 
3301 CTTGATrTAA GGCHGAAAA GCTGGCGGAA 6TC6G6AGGT JCGCTAAAAC 6CCTC6C6TT. 3^ 
3351 CTTAGAATAC GGGATAAGGG TTGTATATGT GATriGGnG GTATTGG6CG C66TAATGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCTTGCTT GHCTCGATG AGT6CGGTAC nGGHTAAT 3480 
348 ACCCGTTCTT 6GAATGATAA G6AAAGACAG CCGAnATTG ATTGGjna ACATGCTC6T -354^ 
3541 AAATTAGGAT GGGATATTAT TTTTCTTGn CAGGACHAT CTATTGTTGA TAAACAGGC6 3600 
3601 CGTTCTGCAT TA6CTGAACA TGnGTrTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 
3551 TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GTT6GCGTTG TTAAATATGG CGAHCTCAA HAAGCCCTA CTGTTGAGCG TTG6CTTTAT 3780 
3781 AGTGGTAAGA ATTTGTATAA GGCATATGAT ACTAAAGAGG CmTTCTAG TAATTATGAT 3840 
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I 10 I 20 I 30 I 40 I 50 I 60 . 
1 AAT6CTACTA CTATTAGTAG AATTGATGCC ACCTTHCAG CTCGC6CCCC AAATGAAAAT SO 
61 ATAGCTAAAC AG6TTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCAGA ATT6GGAATC AACT6TTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACAT6T T6AGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 2^0 
2H1 TCT6CAAAAA TGACCTCTTA TGAAAAGGAG CAAJTAAAGG TACTCTCTAA TCCTGACCTG 300 
301 TTGGAGTTTG CTTCCG6TCT G6TTC6CTTT GAAGCTCGAA TTAAAACGCG ATATTTGAA6 350 
361 TCTTTCGGGC TTCCTCHAA TCTTTTTGAT GCAATCCGCT TTGCTTCT6A CTATAATA6T ^20 
m CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT 6TTTAAA6CA |80 
m TTT6AGGGGG ATTCAATGAA TATTTATGAC GATTCCGCA6 TATT6GACGC TATCCA6TCT 5W 
5H1 AAACATTTTA CTAHACCCC CTCTGGCAAA ACTTCTHTG CAAAAGCCTC TCGCTATTTT 6C0 
601 GGrnTTATC. GTCGTCTGGT AAACGAGGGT TAT6ATAGTG TT6CTCTTAC jmCCKGT 650 
661 AATTCCTTTT GGCGTTATGT ATCTGCATTA GHGAATGTG GTATTCCTAA ATCTCAACTG 720 
721 ATGAATCm CTACCTGTAA TAATGTTGn CCGnAGTTC 6TTTTATTAA CGTAGATTTT 780 
781 TCnCCCAAG GTCCTGACTG 6TATAATGAG CCAGTTCnA AAATGGCATA AGGTAATTCA 840 
841 CAATGATTAA AGHGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGT6TTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTHG TTACGTTGAT TTGGGTAATG 950 
961 -AATATCCGGT TCTTGTCAAG ATTACTCHG ATGAAGGTCA GCCAGCCTAT GCGCCT6GTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG nGGTCAGTT CG67TCCCTT ATGATTGACC 1080 
1081 GTCTGCGCCT CGnCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 
1141 CAGGCGATGA TACAAATGTC CGTTGTACn TGTTTCGCGC TTGGTATAAT CGCTGGGG6T 1200 
1201 CAAA6ATGA6 TGTTTTAGTG TAHCTTTCG CCTCTTTCGT TTTAGCnGG TGCCHCGTA 1250 
1261 GTGGCAHAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCT6 TCTTTCGCTG CTGAG6GTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGHA 1440 
1441 TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCHTT GGAGCGTTTT 1550 
1561 mrrGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCHTAGT TGnCGinC 1620 
1621 TAnCTCACT CCGCTGAAAC TGHGAAAGT TGTnAGCAA AACCCCATAC AGAAAATTCA 1680 
1681 ITTACTAACG TCTGGAAAGA C6ACAAAACT mGATCGTT ACGCTAACTA TGAGGGTTGT 1740 
1741 CTGTGGAATG CTACAGGCGT TGTAGniGT ACTGGTGACG AAACTCAGTG HACGGTACA 1800 
1801 TGGGTTCCTA nGGGCTTGC TATCCCT6AA AATGAGGGTG GTGGCTCTGA GGGT6GC6GT 1860 
1861 TCTGAGGGTG GCGGHCTGA GGGTGGCGGT ACTAAACCTC CTGA6TACGG TGATACACCT 1920 
1921 ATTCCGGGCT ATACHATAT CAACCCTCTC GACGGCACH ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCAT6TTT 2040 
2041 CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCAHAACTG TTTATACGGG CACTGTTACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACnATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 , 
2161 TATGACGCn ACTGGAACGG TAAATTCAGA GACTGCGCH TCCATTCTGG Cmmm 2220 
2221 GATGCAHCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC T6CCTCAACC TCCT6TCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTG6TGG CTCTGA6G6T 2340 
2341 GGCGGHCTG AGGGTGGC6G CTCTGAGGGA GGCGGHCCG GTGDT6GCTC TGGTTCCGGT 2400 
2401 GAnTTGAn AT6AAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 
2451 GAAAACGCGC TACAGTCTGA CGCTAAA6GC AAACHGAH CTGTCGCTAC TGATTACGGT 2520 
2521 6CT6CTATCG ATGGTTrCAT TGGTGACGH TCCGGCCnG CTAAT6GTAA T6GTGCTACT 2580 
2581 GGTGATnTG CTGGCTCTAA nCCCAAATG 6CTCAA6TCG GTGAC6GTGA TAATTCACCT 2640 
2641 TTAATGAATA ATTTCCGTCA ATATITACCT TCCCTCCCTC AATCGGTTGA AT6TC6CCCT 2700 
2701 niGTCHTA GC6CTGGTAA ACCATAT6AA TTnCTATTG ATTGTGACAA AATAAAGTTA 2750 
2761 nCCGTGGTG TCTTTGCGn TCmTATAT GTT6CCACCT TTATGTATGT ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGC6TAA TAA6GAGTCT TAATCATCa AGnCTTTTG 66TA TTCG6 T 2880 
2881 TAHAHGCG TTTCCTCGGT nCCTTCTGG TAAmTGH CGGCm^^ CnACTTTTC 2940 
2941 HAAAAAGGG CHCGGTAAG ATAGCTAHG CTATTTCATT GTTTCTTGCT CnATTATTG 3000 
3001 G6CTTAACTC AAHCTTGIG 66TTATCTCT CT6ATATTA6 CGCTCAATTA CCCTCTGACT 3050 
3061 HGHCAGGG TGnCAGHA AHC TCCCG T CTAATGC6CT TCCCT6TTTT TATGTTAnC 3120 
3121 TCTCTGTAAA GGCTGCTATT nCATnTTG ACGHAAACA AAAAATCGH TCnATTTGG 3180 
3181 AHGGGATAA ATAATATGGC TGinAniT 6TAACT6GCA AAHAGGCTC TG6AAAGACG 3240 
3241 CTCGTTAGCG HGGTAAGAT TCA6GATAAA AHGTAGGTG 66TGCAAAAT A6GAAGTAAT 3300 
3301 CTT6ATTTAA G6CTTGAAAA CCTCCCGGAA GTCG66AGGT TGGGTAAAAC GCGTGGG6TT 3350 
3361 CTTAGAATAC C66ATAAGGG HGTATAIGT GATnGGHG GTAnGGGCG C6GTAATGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCTTGGn GTTCTCGATG AGTGCGgAC TTGGmAAT- 348^ 
3481 ACCCGnCTT GGAATGATAA GGAAAGACA6 CC6ATTATT6 ATT66TTTCT ACAT6CTC6T 3540 
3541 AAATTA6GAT GGGATAHAT CTTCCnGn CAGGACHAT CTAnGHGA TAAACAGGC6 3600 
3601 CGTTCT6CAT TAGCTGAACA TGnGHTAT TGTCGTC6TC TG6ACAGAAT TACTTTACCT 3660 
3661 niGTCGGTA CTTTATAHC TCnAHACT 6GCTC6AAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GHGGCGTTG HAAATATGG CGATTCTCAA HAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 
3781 ACTGGTAAGA AHTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
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I 10 I 20 1-30 I ^Q I 50 I 60 
1 AATGCTACTA CTATTAGTAG AAHGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 
61 ATAGCTAAAC AG6TTATT6A CCATTT6CGA AATGTATCTA ATGGTCAAAC TAAATGTACT 120 
121 C6TTCGCAGA ATTGGGAATC AACTGHACA TGGAATGAAA CTTCCAGACA CCGTACriTA 180 
181 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGAHC AGCAATTAAG GTCTAAGCCA 2^0 
241 TCT6CAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAG6 TACTCTCTAA TCCTGACCT6 300 
301 nGGAGTHG CnCCGGTCT GGTTCGCTTT GAA6CTCGAA TTAAAACGCG ATATTTGAAG 360' 
361 TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCIATCCGCT TTCGTTCTGA CTATAATAGT 420 
421 CAGG6TAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTHAAAGCA 480 
481 TTTGAGGG6G ATTCAATGAA TATTTATGAC GATTCC6CAG TATTGGACGC TATCCA6TCT 540 
541 AAACATTTTA CTAnACCCC CTCTGGCAAA ACTTCTTTT6 CAAAAGCCTC TCGCTATTtr 600 
601 GGTHTTATC 6TCGTCTGGT AAACGAGGGT TAT6ATAGTG TTGCTCTTAC TAT6CCTCGT 660' 
661 AAnCCTHT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 
721 AT6AATCTTT CTACCTGTAA TAATGHGTI CCGmGTTC GmTAnAA CGTA6ATTTT 780 
, 781 TCTTCCCAAC GTCCTGACT6 GTATAATGAG CCA6TTCTTA AAATCGCATA A6GTAATTCA 840 
841 CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTC6T TCTGGTGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTHG TTAC6TTGAT TTGGGTAATG 960 
796-rAATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GC6GCTG6TC 1020 
1021 T6TACACC6T TCATCTGTCC TCTTTCAAAG nGGTCAGTT GGGTTCCCTT ATGAHGACC 1080 
1081 GTCT6CGCCT CGnCCGGCT AAGTAACATG GAGCAGGTCG C66ATTTCGA CACAATTTAT imo 
1141 CAGGCGAT6A TACAAATCTC GGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGG6T 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCHTCG CCTCTTTGGT TTTAGGnGG TGCCTTCGTA 1260 
1261 6T66CATTAC 6TATTTTACC CGTTTAATG6 AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCC6ATGCTG TCTTTCGCTG CTGAGG6T6A 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATGGGHA 1440 
1441 TGCGTGGGCG ATGGTTGnG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTHAAGAA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTn 1560 
1561 nrnGGAGA HTTCAACGT GAAAAAAHA TTATTCGCAA nCCTTTAGT TGTTCCTTTC 1620 
. 1621 TAHCTCACT CCGCTGAAAC TGTTGAAAGT TGnTAGCAA AACCGCATAC AGAAAATTCA 1680 
1681 TTTACTAACG TCTGGAAAGA GCACAAAACT nAGATCGTT AGGCTAACTA TGAGGGHGT 1740 
1741 CTGTGGAAT6 CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 
1801 TGGGnCCTA nGGGCHGC TATCCCTGAA AATGAG6GTG GTGGCTCTGA GGGTG6CS6T 1860 
1851 TCTGAGGGT6 GCGGTTCT6A G6GTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
1921 AnCCGGGCT ATACHATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACT6A6CAA 1980 
1981 AACCCCGCTA ATCCTAATCC nCTCTTGAG GAGTCTCA6C CTCHAATAC nTCATGTTT 2040 
2041 CA6AATAATA GGHCCGAAA TAGGCAGGGG GCAHAACTG 7TTATACG6G CAasnACT 2100 
2101 CAAG6CACT6 ACCCCGnAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCAT6 2160 
2161 TATGACGCn ACTGGAAC6G TAAAHCAGA GACTGCGCTT TCCAnCTGG C1TTAAT6AA 2220 
2221 GATCCAHCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGeCTCAACC TCaCTCAAT 2280 
2281 GCTGGCGGCG GCTCT6GTGG TGGnCTGGT G6CG6CTCTG AGGGTGGTGG CTCTGAG6GT 2340 
2341 66CGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGHCCG GT6GTGGCTC IGGHCCGGT 2400 
2401 GATTrTGAn ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAAT6CC6AT 2460 
2451 6AAAACGC6C. TACAGTCTGA CGCTAAAGGC AAACHGAH CT6TC6CTAC TGAHACGGT 2520 
2521 GCTGCTATC6 ATGGTTTCAT TGGTGACGH TCCGGCCTTG CTAATGGTAA TGGT6CTACT 2580 
2581 66TGATTTT6 CT6GCTCTAA TTCCCAAATG GCTCAA6TC6 GTGAC6GT6A TAATTCACCT 2640 
2641 TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGHGA AT6TC6CCCT 2700 
2701 TrrGTCTTTA GCGCTGGTAA ACCATATGAA TnTCTATTG AHGIGACAA AATAAACHA 2760 
2761 TTCCGTG6TG TCriTGCGn TGnTTATAT GnGCCACCT nATGTATGT ATnTGTACG 2820 
2821 TTTGCTAACA TAaGCGTAA TAAGGAGTCT TAATCATGCC AGnCTTTTG GGTAHCCGT 2880 
2881 TATTAnGCG TTTCCTCGGT nCCnCTGG TAACTTTGn CGGCTATCTG CnACTTnC 2940 
2941 HAAAAAGGG CHCGGTAAG ATA6CTATT6 CTATTTCAn GITTCnGCT CnAnATTG 3000 
3001 GGCHAACTC AAHCHGIG GGTTATCTCT CTGATAHAG CGCTCAAHA CCCTCTGACT 3060 
3061 nGTTCAGGG TGHCAGHA AnC TCCCG T CTAAT6C6CT TCCCTGTTn. TATGTTATTC 3120 
3121 TCTCTGTAAA GGCTGCTAH TTCATTTTTG ACGHAAACA AAAAATCGn TCHATnGG 3180 
3181 ATTG6GATAA ATAATATGGC TGnTATTTT GTAACTGGCA AATTA66CTC T66AAAGACG 3240 
3241' CTCGHAGCG nCGTAAGAT HAGGATAAA ATTGTA6CTG G6TGGAAAAT AGCAACTAAT* 3300 
3301 CTTGATTTAA GGCHCAAAA' CCTCGC6CAA GTCGGGAGGT TCGCTAAAAC 'GCeTCGCGTT 3360 
3361 CnAGAATAC CGGATAAGCC nCTATATCT GATTTGCnG CTATTGGGCG CGGTAATGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCTTGCTT^ GTTGTGGATG AGTGGGGTAG TTG6TTTAAT 3480 
3481 ACCCGncn GGAATGATAA GGAAAGACAG CCGAHAnG AnGGTTTCT ACAT6CTCGT 3540 
3541 AAATTAGGAT GGGATAHAT TTTTCTTGn CAGGACHAT CTAnGTTGA JAAACAGGCG 3600 . 
3601 CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTAeCT 3660 
3661 TTTGTCGGTA CHTATAnC TCHAnACT GGCTCGAAAA TGCCTCTGCC TAAAHACAT 3720 
3721 GHGGCGHG HAAATATGG CGATTCTCAA nAAGCCCTA CT6TTGAGCG nGGCnTAT 3780 
3781 ACTGGTAAGA A1TTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAAnATGAT 3840 
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